Return xml sequences where an attribute doesn't contain a specific character
An easy way to do this is to use the nodes
method to get right to the address
attribute and check for your @
sign.
The problem with the way you're looking now is that it's only checking that any email address has an @
in it. Parsing the XML nodes out lets you check individual emails for it.
DECLARE @x XML
= '<xml>
<customer name="Max"><email address="[email protected]" /></customer>
<customer name="Erik"><email address="[email protected]" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';
SELECT x.c.value('@address', 'VARCHAR(100)') AS [email]
FROM @x.nodes('/xml/customer/email') AS x(c)
WHERE x.c.exist('@address[contains(., "@")]') = 0;
If you need to query an actual table with an XML column like this, you'd just CROSS APPLY
the nodes method like thusly:
SELECT x.c.value('@address', 'VARCHAR(100)') AS [email]
FROM @x_table AS xt
CROSS APPLY xt.x.nodes('/xml/customer/email') AS x(c)
WHERE x.c.exist('@address[contains(., "@")]') = 0;
If you want to bring all the <customer>...</customer>
XML for that "row" back, you can walk the axis back. Just be aware that walking back can make performance a bit woogy for large XML blocks.
SELECT x.c.query('..')
FROM @x_table AS xt
CROSS APPLY xt.x.nodes('/xml/customer/email') AS x(c)
WHERE x.c.exist('@address[contains(., "@")]') = 0;
Another way of doing it is:
SELECT @x.query('/xml/customer[email/@address[not(contains(., "@"))]]') answer
Moving the square brackets to wrap around the email node effectively make that the WHERE
clause applied to the customer
node. Translating this XQuery to English looks like:
Get me all
xml/customer
nodes with anaddress
attribute which does not contain the@
symbol
You were oh so close. You were definitely on the right track with using the .query()
function and using the contains
XQuery function. What you got wrong was:
- Putting the
= False
outside of the[...]
(meaning, it was not part of thecontains()
expression) - Using the word
False
instead of the functionfalse()
- Not specifying the parent node by adding
/..
to the end of the path (so that the result will include the<customer>
element and not just the<email>
element)
Correcting those three things results in the following XQuery expression that gets you what you are wanting:
'/xml/customer/email[contains(@address, "@") = false()]/..'
Putting that into your original example from the question gives you:
DECLARE @x XML = '<xml>
<customer name="Max"><email address="[email protected]" /></customer>
<customer name="Erik"><email address="[email protected]" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';
SELECT
@x.query('/xml/customer/email[contains(@address, "@")]/..') AS [WithValidEmail],
@x.query('/xml/customer/email[contains(@address, "@")=false()]/..') AS [WithInvalidEmail;
That query returns the following result set of a single row with two XML fields:
WithValidEmail | WithInvalidEmail
<customer name="Max"> | <customer name="Brent">
<email address="[email protected]" /> | <email address="brentcom" />
</customer> | </customer>
<customer name="Erik"> |
<email address="[email protected]" /> |
</customer> |
This is probably more efficient than breaking the document out with the .nodes()
function since it can parse the XML in a single shot and not need to start and stop the parser per each node.
The other benefit of keeping it within .query()
is that you get a single XML document returned. So, if you receive an XML document / value containing multiple nodes worth of stuff, you can maintain the scalar value approach of it being a single entity without having to reconstruct the resulting nodes back into a document again. This also lets you use it in a subquery / CTE without changing the number of expected rows being returned.