Combine column from multiple rows into single row
This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:
DECLARE @x TABLE
(
id INT,
row_num INT,
customer_code VARCHAR(32),
comments VARCHAR(32)
);
INSERT @x SELECT 1,1,'Dilbert','Hard'
UNION ALL SELECT 1,2,'Dilbert','Worker'
UNION ALL SELECT 2,1,'Wally','Lazy';
SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments
FROM @x AS x2 WHERE id = x.id
ORDER BY row_num
FOR XML PATH('')), 1, 1, '')
FROM @x AS x
GROUP BY id, customer_code
ORDER BY id;
If you have a case where the data in comments could contain unsafe-for-XML characters (>
, <
, &
), you should change this:
FOR XML PATH('')), 1, 1, '')
To this more elaborate approach:
FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')
(Be sure to use the right destination data type, varchar
or nvarchar
, and the right length, and prefix all string literals with N
if using nvarchar
.)
If you're allowed to use CLR in your environment, this is a tailor-made case for a user-defined aggregate.
In particular, this is probably the way to go if the source data is non-trivially large and/or you need to do this type of thing a lot in your application. I strongly suspect the query plan for Aaron's solution will not scale well as the input size grows. (I tried adding an index to the temp table, but that didn't help.)
This solution, like many other things, is a tradeoff:
- Politics/policy for even using CLR Integration in your, or your client's, environment.
- CLR function is likely faster, and will scale better given a real set of data.
- CLR function will be reusable in other queries, and you won't have to duplicate (and debug) a complex subquery every time you need to do this type of thing.
- Straight T-SQL is simpler than writing and managing a piece of external code.
- Perhaps you don't know how to program in C# or VB.
- etc.
EDIT: Well, I went to try to see if this actually was better, and it turns out the requirement that the comments be in a specific order is currently not possible to satisfy using an aggregate function. :(
See SqlUserDefinedAggregateAttribute.IsInvariantToOrder. Basically, what you need to do is OVER(PARTITION BY customer_code ORDER BY row_num)
but ORDER BY
is not supported in the OVER
clause when aggregating. I'm assuming adding this functionality to SQL Server opens a can of worms, because what would need to be changed in the execution plan is trivial. The aforementioned link says this is reserved for future use, so this could be implemented in the future (on 2005 you're probably out of luck, though).
This could still be accomplished by packing and parsing the row_num
value into the aggregated string, and then doing the sort within the CLR object... which seems pretty hackish.
In any event, below is the code I used in case anyone else finds this useful even with the limitation. I'll leave the hacking part as an exercise for the reader. Note that I used AdventureWorks (2005) for test data.
Aggregate assembly:
using System;
using System.IO;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace MyCompany.SqlServer
{
[Serializable]
[SqlUserDefinedAggregate
(
Format.UserDefined,
IsNullIfEmpty = false,
IsInvariantToDuplicates = false,
IsInvariantToNulls = true,
IsInvariantToOrder = false,
MaxByteSize = -1
)]
public class StringConcatAggregate : IBinarySerialize
{
private string _accum;
private bool _isEmpty;
public void Init()
{
_accum = string.Empty;
_isEmpty = true;
}
public void Accumulate(SqlString value)
{
if (!value.IsNull)
{
if (!_isEmpty)
_accum += ' ';
else
_isEmpty = false;
_accum += value.Value;
}
}
public void Merge(StringConcatAggregate value)
{
Accumulate(value.Terminate());
}
public SqlString Terminate()
{
return new SqlString(_accum);
}
public void Read(BinaryReader r)
{
this.Init();
_accum = r.ReadString();
_isEmpty = _accum.Length == 0;
}
public void Write(BinaryWriter w)
{
w.Write(_accum);
}
}
}
T-SQL for testing (CREATE ASSEMBLY
, and sp_configure
to enable CLR omitted):
CREATE TABLE [dbo].[Comments]
(
CustomerCode int NOT NULL,
RowNum int NOT NULL,
Comments nvarchar(25) NOT NULL
)
INSERT INTO [dbo].[Comments](CustomerCode, RowNum, Comments)
SELECT
DENSE_RANK() OVER(ORDER BY FirstName),
ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY ContactID),
Phone
FROM [AdventureWorks].[Person].[Contact]
GO
CREATE AGGREGATE [dbo].[StringConcatAggregate]
(
@input nvarchar(MAX)
)
RETURNS nvarchar(MAX)
EXTERNAL NAME StringConcatAggregate.[MyCompany.SqlServer.StringConcatAggregate]
GO
SELECT
CustomerCode,
[dbo].[StringConcatAggregate](Comments) AS AllComments
FROM [dbo].[Comments]
GROUP BY CustomerCode