How to remove non-ascii chars using sed

The solutions offered here did not work for me. Maybe my problem was different, but I needed to strip the ASCII colors and other characters from the otherwise pure ASCII text.

The following worked for me, however:

Stripping Escape Codes from ASCII Text

Click to copy

sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g'

In context (BASH):

Click to copy

$ printf "\e[32;1mhello\e[0m\n"
hello

$ printf "\e[32;1mhello\e[0m\n" | cat -vet
^[[32;1mhello^[[0m$

$ printf "\e[32;1mhello\e[0m\n" | sed -E 's/\x1b\[[0-9]*;?[0-9]+m//g' | cat -vet
hello$

Did you try

Click to copy

cat /bin/mkdir | tr -cd "[:print:]"

I think it solves the problem ?

If only text content interest you, you can also use

Click to copy

cat /bin/mkdir | strings

Do you know what encoding the file is currently using? If so, you can use iconv to convert it. It's a utility to convert from one character encoding to another. So if the original file is in UTF-8 and you want to convert to ASCII you can use the following:

Click to copy

iconv -f utf8 -t ascii <inputfile>

The file command on the input file might tell you the current encoding.

Interestingly, there's a command called enca which will do its best to determine the character encoding being used if you know the language of the contents of the file.

This other question might be the answer.

This doesn't seem to work with sed. Perhaps tr will do?

Click to copy

tr -d '\200-\377'

Or with the complement:

Click to copy

tr -cd '\000-\177'

How to remove non-ascii chars using sed

Tags:

Linux

Unix

Regex

Sed

Non Ascii Characters

Related

Recent Posts