identify files with non-ASCII or non-printable characters in file name

Assuming that "foreign" means "not an ASCII character", then you can use find with a pattern to find all files not having printable ASCII characters in their names:

LC_ALL=C find . -name '*[! -~]*'

(The space is the first printable character listed on http://www.asciitable.com/, ~ is the last.)

The hint for LC_ALL=C is required (actually, LC_CTYPE=C and LC_COLLATE=C), otherwise the character range is interpreted incorrectly. See also the manual page glob(7). Since LC_ALL=C causes find to interpret strings as ASCII, it will print multi-byte characters (such as π) as question marks. To fix this, pipe to some program (e.g. cat) or redirect to file.

Instead of specifying character ranges, [:print:] can also be used to select "printable characters". Be sure to set the C locale or you get quite (seemingly) arbitrary behavior.

Example:

$ touch $(printf '\u03c0') "$(printf 'x\ty')"
$ ls -F
dir/  foo  foo.c  xrestop-0.4/  xrestop-0.4.tar.gz  π
$ find -name '*[! -~]*'       # this is broken (LC_COLLATE=en_US.UTF-8)
./x?y
./dir
./π
... (a lot more)
./foo.c
$ LC_ALL=C find . -name '*[! -~]*'
./x?y
./??
$ LC_ALL=C find . -name '*[! -~]*' | cat
./x y
./π
$ LC_ALL=C find . -name '*[![:print:]]*' | cat
./x y
./π

If you translate each file name using tr -d '[\200-\377]' and compare it with the original name, then any file names that have special characters will not be the same.

(The above assuming that you mean non-ASCII with foreign)


You can use tr to delete any foreign character from a filename and compare the result with the original filename to see if it contained foreign characters.

find . -type f > filenames
while read filename; do
      stripped="$(printf '%s\n' "$filename" | tr -d -C '[[:alnum:]][[:space:]][[:punct:]]')"
      test "$filename" = "$stripped" || printf '%s\n' "$filename"; 
done < filenames