How to remove non-ascii chars using sed

The solutions offered here did not work for me. Maybe my problem was different, but I needed to strip the ASCII colors and other characters from the otherwise pure ASCII text.

The following worked for me, however:

Stripping Escape Codes from ASCII Text

sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g'

In context (BASH):

$ printf "\e[32;1mhello\e[0m\n"
hello

$ printf "\e[32;1mhello\e[0m\n" | cat -vet
^[[32;1mhello^[[0m$

$ printf "\e[32;1mhello\e[0m\n" | sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g' | cat -vet
hello$

Did you try

cat /bin/mkdir | tr -cd "[:print:]"

I think it solves the problem ?

If only text content interest you, you can also use

cat /bin/mkdir | strings

Do you know what encoding the file is currently using? If so, you can use iconv to convert it. It's a utility to convert from one character encoding to another. So if the original file is in UTF-8 and you want to convert to ASCII you can use the following:

iconv -f utf8 -t ascii <inputfile>

The file command on the input file might tell you the current encoding.

Interestingly, there's a command called enca which will do its best to determine the character encoding being used if you know the language of the contents of the file.

This other question might be the answer.


This doesn't seem to work with sed. Perhaps tr will do?

tr -d '\200-\377'

Or with the complement:

tr -cd '\000-\177'