Multiple column concatenation
A CLR aggregate will almost certainly be the fastest way of doing this. But perhaps you don't want to use one for whatever reason...
You say that the source for this is an expensive query.
I would materialise this into a #temp
table first to ensure it is only evaluated once.
CREATE TABLE #test
(
ID INT,
name NVARCHAR(128),
car NVARCHAR(128)
);
CREATE CLUSTERED INDEX ix ON #test(ID);
The execution plan I get for the query in the question first does the concatenation for every row in the outer query and then removes the duplicates by id, SomeField_Combined1, SomeField_Combined2
.
This is incredibly wasteful. The following rewrite avoids this.
SELECT t.id,
stuff([M].query('/name').value('/', 'varchar(max)'), 1, 1, '') AS [SomeField_Combined1],
stuff([M].query('/car').value('/', 'varchar(max)'), 1, 1, '') AS [SomeField_Combined2]
FROM (SELECT DISTINCT id
FROM #test) t
OUTER APPLY(SELECT (SELECT id,
',' + name AS name,
',' + car AS car
FROM #test
WHERE #test.id = t.id
FOR XML PATH(''), type) AS M) M
However for the following test data (1000 ids with 2156 rows per id for me)
INSERT INTO #test
SELECT v.number, o.name, o.type_desc
FROM sys.all_objects o
INNER JOIN master..spt_values v
ON v.type = 'P' AND v.number BETWEEN 1 AND 1000
I still found Kenneth's solution with two XML PATH
calls much faster and less resource intensive.
+-----------------+--------------------+------------------------+------------------+---------------------+-------------------------+-----------------------------+
| | CPU Time (Seconds) | Elapsed Time (Seconds) | #test Scan Count | #test Logical Reads | Worktable logical reads | Worktable lob logical reads |
+-----------------+--------------------+------------------------+------------------+---------------------+-------------------------+-----------------------------+
| Single XML PATH | 51.077 | 15.521 | 1,005 | 60,165 | 51,161 | 1,329,207 |
| Double XML PATH | 3.1720 | 3.010 | 2,005 | 92,088 | 14,951 | 233,681 |
+-----------------+--------------------+------------------------+------------------+---------------------+-------------------------+-----------------------------+
For each distinct id
in #test
it performs two operations instead of one but this operation is significantly cheaper than constructing the XML and then reparsing it.
I ran a few tests using a little over 6 mil rows. With an index on the ID column.
Here is what I came up with.
Your initial query:
SELECT * FROM (
SELECT t.id,
stuff([M].query('/name').value('/', 'varchar(max)'),1,1,'') AS [SomeField_Combined1],
stuff([M].query('/car').value('/', 'varchar(max)'),1,1,'') AS [SomeField_Combined2]
FROM dbo.test t
OUTER APPLY(SELECT (
SELECT id, ','+name AS name
,','+car AS car
FROM test WHERE test.id=t.id
FOR XML PATH('') ,type)
AS M)
M ) S
GROUP BY id, SomeField_Combined1, SomeField_Combined2
This one ran for ~23 minutes.
I ran this version which is the version I first learned. In some ways it seems like it should take longer but it doesn't.
SELECT test.id,
STUFF((SELECT ', ' + ThisTable.name
FROM test ThisTable
WHERE test.id = ThisTable.id
AND ThisTable.name <> ''
FOR XML PATH ('')),1,2,'') AS ConcatenatedSomeField,
STUFF((SELECT ', ' + car
FROM test ThisTable
WHERE test.id = ThisTable.id
AND ThisTable.car <> ''
FOR XML PATH ('')),1,2,'') AS ConcatenatedSomeField2
FROM test
GROUP BY id
This version ran in just over 2 minutes.