Tokenize a Stack-Based language

Retina, 68 64 63 bytes

M!s`"(\\.|[^"])*"?|'.|\d+|\S
ms`^'(.)|^"(([^\\"]|\\.)*$)
"$1$2"

or

s`\s*((")(\\.|[^"])*(?<-2>")?|'.|\d+|.)\s*
$1$2¶
\ms`^'(.)
"$1"

I think this covers all the funky edge cases, even those not covered by the test cases in the challenge.

Try it online!


Ruby, 234 bytes

puts"[#{$stdin.read.scan(/("(?:(?<!\\)\\"|[^"])+(?:"|$))|'(.)|(\d+)|(.)/).map{|m|(m[0]?(m[0].end_with?('"')?m[0]: m[0]+'"'): m[1]?"\"#{m[1]}\"": m.compact[0]).strip}.reject(&:empty?).map{|i|"'#{/\d+|./=~i ?i: i.inspect}'"}.join', '}]"

I tried using the find(&:itself) trick that I saw... somewhere, but apparently .itself isn't actually a method. Also, I'm working on golfing the regex down, but it's already unreadable.

If we don't have to output in any fancy way (i.e. strings don't have to be quoted in the array) I can save a whole lotta bytes:

Still Ruby, 194 bytes:

p$stdin.read.scan(/("(?:(?<!\\)\\"|[^"])+(?:"|$))|'(.)|(\d+)|(.)/).map{|m|(m[0]?(m[0].end_with?('"')?m[0]: m[0]+'"').gsub(/\\(.)/,'\1'): m[1]?"\"#{m[1]}\"": m.compact[0]).strip}.reject(&:empty?)

I'm sure I can golf it more, but I'm not quite sure how.


Ungolfed coming soon. I started fiddling with the golfed directly at some point and I'll have to tease it out.