SQL Row concatenation with XML PATH and STUFF giving aggregate sql error
Sorry, I missed a step in the relationship. Try this version (though Martin's will work as well):
SELECT DISTINCT o.section, names= STUFF((
SELECT ', ' + b.Name
FROM dbo.TableA AS a
INNER JOIN dbo.TableB AS b
ON a.AccountId = b.AccountId
WHERE a.Section = o.Section
FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
FROM dbo.TableA AS o;
An approach that is at least as good, but sometimes better, is switching from DISTINCT
to GROUP BY
:
SELECT o.section, names= STUFF((
SELECT ', ' + b.Name
FROM dbo.TableA AS a
INNER JOIN dbo.TableB AS b
ON a.AccountId = b.AccountId
WHERE a.Section = o.Section
FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
FROM dbo.TableA AS o
GROUP BY o.section;
At a high level, the reason DISTINCT
applies to the entire column list. Therefore for any duplicates it has to perform the aggregate work for every duplicate before applying DISTINCT
. If you use GROUP BY
then it can potentially remove duplicates before doing any of the aggregation work. This behavior can vary by plan depending on a variety of factors including indexes, plan strategy, etc. And a direct switch to GROUP BY
may not be possible in all cases.
In any case, I ran both of these variations in SentryOne Plan Explorer. The plans are different in a few minor, uninteresting ways, but the I/O involved with the underlying worktable is telling. Here is DISTINCT
:
And here is GROUP BY
:
When I made the tables larger (14,000+ rows mapping to 24 potential values), this difference is more pronounced. DISTINCT
:
GROUP BY
:
In SQL Server 2017, you can use STRING_AGG
:
SELECT a.section, STRING_AGG(b.Name, ', ')
FROM dbo.TableA AS a
INNER JOIN dbo.TableB AS b
ON a.AccountId = b.AccountId
WHERE a.Section = a.Section
GROUP BY a.section;
The I/O here is almost nothing:
But, if you're not on SQL Server 2017 (or Azure SQL Database), and can't use STRING_AGG
, I have to give credit where credit is due... Paul White's answer below has very little I/O and kicks the pants off of both of the FOR XML PATH
solutions above.
Other enhancements from these posts:
- Grouped Concatenation in SQL Server
- Grouped Concatenation : Ordering and Removing Duplicates
- Comparing string splitting / concatenation methods
Also see:
- Bad habits to kick : avoiding the schema prefix
I thought I would try a solution using XML.
SEDE Demo
Tables
DECLARE @TableA AS table
(
ID integer PRIMARY KEY,
Section varchar(10) NOT NULL,
AccountID char(2) NOT NULL
);
DECLARE @TableB AS table
(
AccountID char(2) PRIMARY KEY,
Name varchar(20) NOT NULL
);
Data
INSERT @TableA
(ID, Section, AccountID)
VALUES
(1, 'shoes', 'A1'),
(2, 'shoes', 'A2'),
(3, 'shoes', 'A3'),
(4, 'books', 'A1');
INSERT @TableB
(AccountID, Name)
VALUES
('A1', 'AccountName1'),
('A2', 'AccountName2'),
('A3', 'AccountName3');
Join and convert to XML
DECLARE @x xml =
(
SELECT
TA.Section,
CA.Name
FROM @TableA AS TA
JOIN @TableB AS TB
ON TB.AccountID = TA.AccountID
CROSS APPLY
(
VALUES(',' + TB.Name)
) AS CA (Name)
ORDER BY TA.Section
FOR XML AUTO, TYPE, ELEMENTS, ROOT ('Root')
);
The XML in the variable looks like this:
<Root>
<TA>
<Section>shoes</Section>
<CA>
<Name>,AccountName1</Name>
</CA>
<CA>
<Name>,AccountName2</Name>
</CA>
<CA>
<Name>,AccountName3</Name>
</CA>
</TA>
<TA>
<Section>books</Section>
<CA>
<Name>,AccountName1</Name>
</CA>
</TA>
</Root>
Query
The final query shreds the XML into sections, and concatenates the names in each:
SELECT
Section =
N.n.value('(./Section/text())[1]', 'varchar(10)'),
Names =
STUFF
(
-- Consecutive text nodes collapse
N.n.query('./CA/Name/text()')
.value('./text()[1]', 'varchar(8000)'),
1, 1, ''
)
-- Shred per section
FROM @x.nodes('Root/TA') AS N (n);