Trying to split string into single words or "quoted words", and want to keep the quotes in the resulting array

You may use the following regular expression split:

str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]

It splits if there is a space but only if the text following until the end contains an even number of ". Be aware that this version will only work if all your strings are properly quoted.

An alternative solution uses scan to read the parts of the string (besides spaces):

p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]

Just to extend the previous answer from Howard, you can add this method:

class String
  def tokenize
    self.
      split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
      select {|s| not s.empty? }.
      map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
  end
end

And the result:

> 'Presentation      about "Test Driven Development"  '.tokenize
=> ["Presentation", "about", "Test Driven Development"]

Tags:

Csv

Ruby

Regex