How to remove non-ascii chars using sed
The solutions offered here did not work for me. Maybe my problem was different, but I needed to strip the ASCII colors and other characters from the otherwise pure ASCII text.
The following worked for me, however:
Stripping Escape Codes from ASCII Text
sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g'
In context (BASH):
$ printf "\e[32;1mhello\e[0m\n"
hello
$ printf "\e[32;1mhello\e[0m\n" | cat -vet
^[[32;1mhello^[[0m$
$ printf "\e[32;1mhello\e[0m\n" | sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g' | cat -vet
hello$
Did you try
cat /bin/mkdir | tr -cd "[:print:]"
I think it solves the problem ?
If only text content interest you, you can also use
cat /bin/mkdir | strings
Do you know what encoding the file is currently using? If so, you can use iconv to convert it. It's a utility to convert from one character encoding to another. So if the original file is in UTF-8 and you want to convert to ASCII you can use the following:
iconv -f utf8 -t ascii <inputfile>
The file command on the input file might tell you the current encoding.
Interestingly, there's a command called enca which will do its best to determine the character encoding being used if you know the language of the contents of the file.
This other question might be the answer.
This doesn't seem to work with sed
. Perhaps tr
will do?
tr -d '\200-\377'
Or with the complement:
tr -cd '\000-\177'