Tokenize a Stack-Based language
Retina, 68 64 63 bytes
M!s`"(\\.|[^"])*"?|'.|\d+|\S
ms`^'(.)|^"(([^\\"]|\\.)*$)
"$1$2"
or
s`\s*((")(\\.|[^"])*(?<-2>")?|'.|\d+|.)\s*
$1$2¶
\ms`^'(.)
"$1"
I think this covers all the funky edge cases, even those not covered by the test cases in the challenge.
Try it online!
Ruby, 234 bytes
puts"[#{$stdin.read.scan(/("(?:(?<!\\)\\"|[^"])+(?:"|$))|'(.)|(\d+)|(.)/).map{|m|(m[0]?(m[0].end_with?('"')?m[0]: m[0]+'"'): m[1]?"\"#{m[1]}\"": m.compact[0]).strip}.reject(&:empty?).map{|i|"'#{/\d+|./=~i ?i: i.inspect}'"}.join', '}]"
I tried using the find(&:itself)
trick that I saw... somewhere, but apparently .itself
isn't actually a method. Also, I'm working on golfing the regex down, but it's already unreadable.
If we don't have to output in any fancy way (i.e. strings don't have to be quoted in the array) I can save a whole lotta bytes:
Still Ruby, 194 bytes:
p$stdin.read.scan(/("(?:(?<!\\)\\"|[^"])+(?:"|$))|'(.)|(\d+)|(.)/).map{|m|(m[0]?(m[0].end_with?('"')?m[0]: m[0]+'"').gsub(/\\(.)/,'\1'): m[1]?"\"#{m[1]}\"": m.compact[0]).strip}.reject(&:empty?)
I'm sure I can golf it more, but I'm not quite sure how.
Ungolfed coming soon. I started fiddling with the golfed directly at some point and I'll have to tease it out.