Regex to pick characters outside of pair of quotes

This will match any string up to and including the first non-quoted ",". Is that what you are wanting?

/^([^"]|"[^"]*")*?(,)/

If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:

/(,)(?=(?:[^"]|"[^"]*")*$)/

which will match all of them. Thus

'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')

replaces all the commas not inside quotes with semicolons, and produces:

'test; a "comma,"; bob; ",sam,";here'

If you need it to work across line breaks just add the m (multiline) flag.


The below regexes would match all the comma's which are present outside the double quotes,

,(?=(?:[^"]*"[^"]*")*[^"]*$)

DEMO

OR(PCRE only)

"[^"]*"(*SKIP)(*F)|,

"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.

DEMO


The below regex would match all the comma's which are present inside the double quotes,

,(?!(?:[^"]*"[^"]*")*[^"]*$)

DEMO


While it's possible to hack it with a regex (and I enjoy abusing regexes as much as the next guy), you'll get in trouble sooner or later trying to handle substrings without a more advanced parser. Possible ways to get in trouble include mixed quotes, and escaped quotes.

This function will split a string on commas, but not those commas that are within a single- or double-quoted string. It can be easily extended with additional characters to use as quotes (though character pairs like « » would need a few more lines of code) and will even tell you if you forgot to close a quote in your data:

function splitNotStrings(str){
  var parse=[], inString=false, escape=0, end=0

  for(var i=0, c; c=str[i]; i++){ // looping over the characters in str
    if(c==='\\'){ escape^=1; continue} // 1 when odd number of consecutive \
    if(c===','){
      if(!inString){
        parse.push(str.slice(end, i))
        end=i+1
      }
    }
    else if(splitNotStrings.quotes.indexOf(c)>-1 && !escape){
      if(c===inString) inString=false
      else if(!inString) inString=c
    }
    escape=0
  }
  // now we finished parsing, strings should be closed
  if(inString) throw SyntaxError('expected matching '+inString)
  if(end<i) parse.push(str.slice(end, i))
  return parse
}

splitNotStrings.quotes="'\"" // add other (symmetrical) quotes here

Try this regular expression:

(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,

This does also allow strings like “'foo\'bar' => 'bar\\',”.

Tags:

Regex