Regex to match nested json objects
This recursive Perl/PCRE regular expression should be able to match any valid JSON or JSON5 object, including nested objects and edge cases such as braces inside JSON strings or JSON5 comments:
/(\{(?:(?>[^{}"'\/]+)|(?>"(?:(?>[^\\"]+)|\\.)*")|(?>'(?:(?>[^\\']+)|\\.)*')|(?>\/\/.*\n)|(?>\/\*.*?\*\/)|(?-1))*\})/
Of course, that's a bit hard to read, so you might prefer the commented version:
m{
( # Begin capture group (matching a JSON object).
\{ # Match opening brace for JSON object.
(?: # Begin non-capturing group to contain alternations.
(?>[^{}"'\/]+) # Match a non-empty string which contains no braces, quotes or slashes, without backtracking.
| # Alternation; next alternative follows.
(?>"(?:(?>[^\\"]+)|\\.)*") # Match a double-quoted JSON string, without backtracking.
| # Alternation; next alternative follows.
(?>'(?:(?>[^\\']+)|\\.)*') # Match a single-quoted JSON5 string, without backtracking.
| # Alternation; next alternative follows.
(?>\/\/.*\n) # Match a single-line JSON5 comment, without backtracking.
| # Alternation; next alternative follows.
(?>\/\*.*?\*\/) # Match a multi-line JSON5 comment, without backtracking.
| # Alternation; next alternative follows.
(?-1) # Recurse to most recent capture group, to match a nested JSON object.
)* # End of non-capturing group; match zero or more repetitions of this group.
\} # Match closing brace for JSON object.
) # End of capture group (matching a JSON object).
}x
As others have suggested, a full-blown JSON parser is probably the way to go. If you want to match the key-value pairs in the simple examples that you have above, you could use:
(?<=\{)\s*[^{]*?(?=[\},])
For the input string
{title:'Title', {data:'Data', {foo: 'Bar'}}}
This matches:
1. title:'Title'
2. data:'Data'
3. foo: 'Bar'
Thanks to @Sanjay T. Sharma that pointed me to "brace matching" because I eventually got some understanding of greedy expressions and also thanks to others for saying initially what I shouldn't do. Fortunately it turned out it's OK to use greedy variant of expression
\\{\s*title.*\\}
because there is no non-JSON data between closing brackets.