Re-encode text file to use Mathematica's character name, like \[Alpha]
Based off information from this thread, the following should work.
fromCode[c_Integer] /; c < 160 := FromCharacterCode[c];
fromCode[c_Integer] := "\\[" <> System`Private`LookupNameByCode[c] <> "]";
Test:
Export["myfile", "x\[Alpha]y", "Text"];
StringJoin[fromCode /@ ToCharacterCode[
Import["myfile", "Text", CharacterEncoding -> "UTF-8"]]]
"x\\[Alpha]y"
Here is a robust version of fromCode
which uses only well-documented functionality, and correctly handles extended-ASCII and Unicode characters with which the original version fails:
fromCode[c_Integer] /; c <= 127 := FromCharacterCode[c];
fromCode[c_Integer] :=
StringTake[ToString[FromCharacterCode[c], InputForm,
CharacterEncoding -> "PrintableASCII"], {2, -2}];
Notes:
The ASCII character set contains characters with codes up to 127 inclusively, so the upper bound is set to 127.
When importing as
"Text"
we don't have to specifyCharacterEncoding -> "UTF8"
explicitly, sinceImport["file.txt"]
reads a text file, taking the character encoding to be"UTF8"
by default.
Testing:
Export["myfile", "xαy\nLamé \[LongRightArrow] αβ+", "Text"];
StringJoin[fromCode /@ ToCharacterCode[Import["myfile", "Text"]]]
"x\\[Alpha]y Lam\\[EAcute] \\[LongRightArrow] \\[Alpha]\\[Beta]+"
Another approach is to use ExportAsASCII
function from this answer which should be much more efficient:
ExportAsASCII["myfileInASCII", Import["myfile", "Text"]]
For the particular use case I am interested in (i.e. replacing non-ASCII characters in package files), one can simply open the file (.m
or .wl
) with the Front End and re-save it.
This can also be automated:
NotebookSave@NotebookOpen["mypackage.m"] (* Warning: this overwrites the file! *)
This method does insert (:: Package ::)
at the beginning of the file, and does require the system character encoding to be the same as that of the .m
file. It may also change the newline style (LF vs CR/LF). But these are relatively minor inconveniences.
The code formatting (indentation, etc.) is preserved.
I verified that nothing else is changed by diff
ing the end result with the original input file.