Split a string on commas not contained within double-quotes with a twist
A home-grown parser is easily written.
For example, this ANTLR grammar takes care of your example input without much trouble:
parse
: line*
;
line
: Quoted ( ',' Quoted )* ( '\r'? '\n' | EOF )
;
Quoted
: '"' ( Atom )* '"'
;
fragment
Atom
: Parentheses
| ~( '"' | '\r' | '\n' | '(' | ')' )
;
fragment
Parentheses
: '(' ~( '(' | ')' | '\r' | '\n' )* ')'
;
Space
: ( ' ' | '\t' ) {skip();}
;
and it would be easy to extend this to take escaped quotes or parenthesis into account.
When feeding the parser generated by that grammar to following two lines of input:
"Thanks,", "in advance,", "for("the", "help")"
"and(,some,more)","data , here"
it gets parsed like this:
If you consider to use ANTLR for this, I can post a little HOW-TO to get a parser from that grammar I posted, if you want.
Sometimes it is easier to match what you want instead of what you don't want:
String s = "\"Thanks,\", \"in advance,\", \"for(\"the\", \"help\")\"";
String regex = "\"(\\([^)]*\\)|[^\"])*\"";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(s.substring(m.start(),m.end()));
}
Output:
"Thanks,"
"in advance,"
"for("the", "help")"
If you also need it to ignore closing brackets inside the quotes sections that are inside the brackets, then you need this:
String regex = "\"(\\((\"[^\"]*\"|[^)])*\\)|[^\"])*\"";
An example of a string which needs this second, more complex version is:
"foo","bar","baz(":-)",":-o")"
Output:
"foo"
"bar"
"baz(":-)",":-o")"
However, I'd advise you to change your data format if at all possible. This would be a lot easier if you used a standard format like XML to store your tokens.