Multiple column concatenation

A CLR aggregate will almost certainly be the fastest way of doing this. But perhaps you don't want to use one for whatever reason...

You say that the source for this is an expensive query.

I would materialise this into a #temp table first to ensure it is only evaluated once.

Click to copy

CREATE TABLE #test
(
ID INT,
name NVARCHAR(128),
car  NVARCHAR(128)
);

CREATE CLUSTERED INDEX ix ON #test(ID);

The execution plan I get for the query in the question first does the concatenation for every row in the outer query and then removes the duplicates by id, SomeField_Combined1, SomeField_Combined2.

This is incredibly wasteful. The following rewrite avoids this.

Click to copy

SELECT t.id,
       stuff([M].query('/name').value('/', 'varchar(max)'), 1, 1, '') AS [SomeField_Combined1],
       stuff([M].query('/car').value('/', 'varchar(max)'), 1, 1, '')  AS [SomeField_Combined2]
FROM   (SELECT DISTINCT id
        FROM   #test) t
       OUTER APPLY(SELECT (SELECT id,
                                  ',' + name AS name,
                                  ',' + car  AS car
                           FROM   #test
                           WHERE  #test.id = t.id
                           FOR XML PATH(''), type) AS M) M

However for the following test data (1000 ids with 2156 rows per id for me)

Click to copy

INSERT INTO #test
SELECT v.number, o.name, o.type_desc
FROM   sys.all_objects o
       INNER JOIN master..spt_values v
         ON v.type = 'P' AND v.number BETWEEN 1 AND 1000

I still found Kenneth's solution with two XML PATH calls much faster and less resource intensive.

Click to copy

+-----------------+--------------------+------------------------+------------------+---------------------+-------------------------+-----------------------------+
|                 | CPU Time (Seconds) | Elapsed Time (Seconds) | #test Scan Count | #test Logical Reads | Worktable logical reads | Worktable lob logical reads |
+-----------------+--------------------+------------------------+------------------+---------------------+-------------------------+-----------------------------+
| Single XML PATH | 51.077             | 15.521                 | 1,005            | 60,165              | 51,161                  | 1,329,207                   |
| Double XML PATH | 3.1720             | 3.010                  | 2,005            | 92,088              | 14,951                  |   233,681                   |
+-----------------+--------------------+------------------------+------------------+---------------------+-------------------------+-----------------------------+

For each distinct id in #test it performs two operations instead of one but this operation is significantly cheaper than constructing the XML and then reparsing it.

I ran a few tests using a little over 6 mil rows. With an index on the ID column.

Here is what I came up with.

Your initial query:

Click to copy

SELECT * FROM (
    SELECT t.id,
            stuff([M].query('/name').value('/', 'varchar(max)'),1,1,'') AS [SomeField_Combined1],
            stuff([M].query('/car').value('/', 'varchar(max)'),1,1,'') AS [SomeField_Combined2]
    FROM dbo.test t
    OUTER APPLY(SELECT (
                    SELECT id, ','+name AS name
                    ,','+car AS car
                    FROM test WHERE test.id=t.id
                    FOR XML PATH('') ,type)
                 AS  M) 
            M ) S
GROUP BY id, SomeField_Combined1, SomeField_Combined2

This one ran for ~23 minutes.

I ran this version which is the version I first learned. In some ways it seems like it should take longer but it doesn't.

Click to copy

SELECT test.id,
    STUFF((SELECT ', ' + ThisTable.name
            FROM   test ThisTable
            WHERE  test.id = ThisTable.id
            AND    ThisTable.name <> ''
            FOR XML PATH ('')),1,2,'') AS ConcatenatedSomeField,
    STUFF((SELECT ', ' + car
            FROM   test ThisTable
            WHERE  test.id = ThisTable.id
            AND    ThisTable.car <> ''
            FOR XML PATH ('')),1,2,'') AS ConcatenatedSomeField2
FROM test 
GROUP BY id

This version ran in just over 2 minutes.

Multiple column concatenation

Tags:

Sql Server

Sql Server 2008

T Sql

Related

Recent Posts