Matching a ] (closing square bracket) with PATINDEX using the "[ ]" wildcard
My own solution, which is more of a workaround, consisted in specifying a character range that included the ]
and using that range along with the other characters in the [ ]
wildcard. I used a range based on the ASCII table. According to that table, the ]
character is located in the following neighbourhood:
Hex Dec Char --- --- ---- … 5A 90 Z 5B 91 [ 5C 92 \ 5D 93 ] 5E 94 ^ 5F 95 _ …
My range, therefore, took the form of [-^
, i.e. it included four characters: [
, \
, ]
, ^
. I also specified that the pattern use a Binary collation, to match the ASCII range exactly. The resulting PATINDEX
expression ended up looking like this:
PATINDEX('%[[-^{}:,]%' COLLATE Latin1_General_BIN2, MyJSONString)
The obvious problem with this approach is that the range at the beginning of the pattern includes two unwanted characters, \
and ^
. The solution worked for me simply because the extra characters could never occur in the specific JSON strings I needed to parse. Naturally, this cannot be true in general, so I am still interested in other methods, hopefully more universal than mine.
I have a probably terrible take on this from back when I had to do a lot of string splitting.
If you have a known set of characters, make a table of them.
CREATE TABLE dbo.characters ( character CHAR(1) NOT NULL PRIMARY KEY CLUSTERED );
INSERT dbo.characters ( character )
SELECT *
FROM (
SELECT '[' UNION ALL
SELECT ']' UNION ALL
SELECT '{' UNION ALL
SELECT '}' UNION ALL
SELECT ','
) AS x (v)
Then use that magical CROSS APPLY
along with CHARINDEX
:
SELECT TOP 1000 p.Id, p.Body, ca.*
FROM dbo.Posts AS p
CROSS APPLY (
SELECT TOP 1 CHARINDEX(c.character, p.Body) AS first_things_first
FROM dbo.characters AS c
ORDER BY CHARINDEX(c.character, p.Body) ASC
) AS ca
WHERE ca.first_things_first > 0
If I'm missing something obvious about what you need to do, lemme know.
I've seen approaches in the past to replace the offending character before searching, and putting it back in afterwards.
In this case we could do something like:
DECLARE @test NVARCHAR(MAX);
DECLARE @replacementcharacter CHAR(1) = CHAR(174);
SET @test = 'Test[]@String'
SELECT PATINDEX('%[[' + @replacementcharacter + '@]%', REPLACE(@test,']',@Replacementcharacter))
This code correctly returns 5. I'm using the ¬ character as that's unlikely to appear - if there are no ASCII characters you won't be using, this solution won't work.
Oddly enough though, the direct answer to your question would be no - I can't get PATINDEX to search for ']' either, but if you replace it you don't need to.
Same example but without the variable usage:
DECLARE @test NVARCHAR(MAX);
SET @test = 'Test[]@String'
SELECT PATINDEX('%[[' + CHAR(174) + '@]%', REPLACE(@test,']',CHAR(174)))
Using the above solution in your code yields your required results:
WITH
data AS (SELECT CAST('{"f1":["v1","v2"],"f2":"v3"}' AS varchar(max)) AS ResponseJSON),
parser AS
(
SELECT
Level = 1,
OpenClose = 1,
P = p.P,
S = SUBSTRING(d.ResponseJSON, 1, NULLIF(p.P, 0) - 1),
C = SUBSTRING(d.ResponseJSON, NULLIF(p.P, 0), 1),
ResponseJSON = SUBSTRING(d.ResponseJSON, NULLIF(p.P, 0) + 1, 999999)
FROM
data AS d
CROSS APPLY (SELECT PATINDEX('%[[{'+ CHAR(174) + ']%', REPLACE(d.ResponseJSON,']',CHAR(174)))) AS p (P)
UNION ALL
SELECT
Level = ISNULL(d.OpenClose - 1, 0) + d.Level + ISNULL(oc.OpenClose, 0),
OpenClose = oc.OpenClose,
P = d.P + p.P,
S = SUBSTRING(d.ResponseJSON, 1, NULLIF(p.P, 0) - 1),
C = c.C,
ResponseJSON = SUBSTRING(d.ResponseJSON, NULLIF(p.P, 0) + 1, 999999)
FROM
parser AS d
CROSS APPLY (SELECT PATINDEX('%[[{}:,'+ CHAR(174) + ']%' COLLATE Latin1_General_BIN2, REPLACE(d.ResponseJSON,']',CHAR(174)))) AS p (P)
CROSS APPLY (SELECT SUBSTRING(d.ResponseJSON, NULLIF(p.P, 0), 1)) AS c (C)
CROSS APPLY (SELECT CASE WHEN c.C IN ('[', '{') THEN 1 WHEN c.C IN (']', '}') THEN 0 END) AS oc (OpenClose)
WHERE 1=1
AND p.P <> 0
)
SELECT
*
FROM
parser
OPTION
(MAXRECURSION 0)
;