How do I grep for all non-ASCII characters?
You can use the command:
grep --color='auto' -P -n "[\x80-\xFF]" file.xml
This will give you the line number, and will highlight non-ascii chars in red.
In some systems, depending on your settings, the above will not work, so you can grep by the inverse
grep --color='auto' -P -n "[^\x00-\x7F]" file.xml
Note also, that the important bit is the -P
flag which equates to --perl-regexp
: so it will interpret your pattern as a Perl regular expression. It also says that
this is highly experimental and grep -P may warn of unimplemented features.
Instead of making assumptions about the byte range of non-ASCII characters, as most of the above solutions do, it's slightly better IMO to be explicit about the actual byte range of ASCII characters instead.
So the first solution for instance would become:
grep --color='auto' -P -n '[^\x00-\x7F]' file.xml
(which basically greps for any character outside of the hexadecimal ASCII range: from \x00 up to \x7F)
On Mountain Lion that won't work (due to the lack of PCRE support in BSD grep), but with pcre
installed via Homebrew, the following will work just as well:
pcregrep --color='auto' -n '[^\x00-\x7F]' file.xml
Any pros or cons that anyone can think off?