PHP: Regex to ignore escaped quotes within quotes
For most strings, you need to allow escaped anything (not just escaped quotes). e.g. you most likely need to allow escaped characters like "\n"
and "\t"
and of course, the escaped-escape: "\\"
.
This is a frequently asked question, and one which was solved (and optimized) long ago. Jeffrey Friedl covers this question in depth (as an example) in his classic work: Mastering Regular Expressions (3rd Edition). Here is the regex you are looking for:
Good:
"([^"\\]|\\.)*"
Version 1: Works correctly but is not terribly efficient.
Better:
"([^"\\]++|\\.)*"
or "((?>[^"\\]+)|\\.)*"
Version 2: More efficient if you have possessive quantifiers or atomic groups (See: sin's correct answer which uses the atomic group method).
Best:
"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: More efficient still. Implements Friedl's: "unrolling-the-loop" technique. Does not require possessive or atomic groups (i.e. this can be used in Javascript and other less-featured regex engines.)
Here are the recommended regexes in PHP syntax for both double and single quoted sub-strings:
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
$re_sq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
Try a regex like this:
'/"(\\\\[\\\\"]|[^\\\\"])*"/'
A (short) explanation:
" # match a `"`
( # open group 1
\\\\[\\\\"] # match either `\\` or `\"`
| # OR
[^\\\\"] # match any char other than `\` and `"`
)* # close group 1, and repeat it zero or more times
" # match a `"`
The following snippet:
<?php
$text = 'abc "string \\\\ \\" literal" def';
preg_match_all('/"(\\\\[\\\\"]|[^\\\\"])*"/', $text, $matches);
echo $text . "\n";
print_r($matches);
?>
produces:
abc "string \\ \" literal" def
Array
(
[0] => Array
(
[0] => "string \\ \" literal"
)
[1] => Array
(
[0] => l
)
)
as you can see on Ideone.