How to remove unknown zero-width character from string?
You can see the zero-width character using FullForm:
str //FullForm
"\:200b\:200b4-6"
So, in this case, you can use the following:
StringReplace[str, "\:200b"->""] //FullForm
"4-6"
Update
Added a threshold, default 2, for determining "zero" width characters
You can use the following function to look for zero-width characters in your string:
findZeroWidthCharacters[str_, threshold_:2] := With[
{chars = DeleteDuplicates[Characters[str]]},
Cases[
Rasterize[
Row @ MapThread[
Tooltip,
{chars, ToCharacterCode[StringJoin@chars]}
],
"Regions"
],
Rule[{code_, _}, {{l_, _}, {r_, _}}] /; r-l<=threshold :> code
]
]
Here I apply it to your string:
findZeroWidthCharacters[str]
{8203}
Here I apply it to a longer string:
s = ExampleData[{"Text", "AliceInWonderland"}] <> FromCharacterCode[{8203, 8204, 8207}];
StringLength[s]
findZeroWidthCharacters[s] //AbsoluteTiming
51725
{0.170908, {8203, 8204, 8207}}
how about:
ascii = StringJoin@ FromCharacterCode[Range[0, 127]];
StringReplace[str, c_ /; StringFreeQ[ascii, c] -> ""]
Head /@ ToExpression/@StringSplit[%, "-"]
{Integer, Integer}
or even FromCharacterCode[Select[ToCharacterCode[str], # <= 127 &]]
You might even extend that range if you have printable non-ascii characters. In your example the offending character is # 8203
Following a similar logic to george2079:
StringReplace[ImportString[ExportString[str,"Text", CharacterEncoding->"ASCII"]],
"\\:"~~Repeated[HexadecimalCharacter,4]->""]