StringPattern to remove markers around text
With both StringPattern
and RegularExpression
the problem is greediness: wildcards will try to match as much as possible. With StringPattern
this can be fixed using Shortest
:
StringReplace[buf, "\\text{" ~~ Shortest[x___] ~~ "}" :> x]
With a regular expression a quantifier can be made ungreed with ?
(e.g. {(.*?)}
), but when you're going that way, you can actually write a safer regular expression using a negated character class:
StringReplace[buf, RegularExpression["\\\\text{(.*?)}"] :> "$1"]
Which gives the same result.
Both of these have one issue though: they're not entirely safe. When your actual string contains }
, then they will stop at that. Consider:
lst = {"abc", "x}y", "123"};
buf = ToString@TeXForm@lst
This gives:
\{\text{abc},\text{x$\}$y},123\}
And using either solution will turn it into:
\{abc,x$\$y},123\}
I think to fix this, only a regular expression approach is viable, which knows exactly what characters (or combinations) are allowed within the {...}
:
StringReplace[buf, RegularExpression["\\\\text{((?:\\\\.|[^\\\\}])*)}"] :> "$1"]
Which gives
\{abc,x$\}$y,123\}
as expected.