bulk rename (or correctly display) files with special characters
I guess you see this �
invalid character because the name contains a byte sequence that isn't valid UTF-8. File names on typical unix filesystems (including yours) are byte strings, and it's up to applications to decide on what encoding to use. Nowadays, there is a trend to use UTF-8, but it's not universal, especially in locales that could never live with plain ASCII and have been using other encodings since before UTF-8 even existed.
Try LC_CTYPE=en_US.iso88591 ls
to see if the file name makes sense in ISO-8859-1 (latin-1). If it doesn't, try other locales. Note that only the LC_CTYPE
locale setting matters here.
In a UTF-8 locale, the following command will show you all files whose name is not valid UTF-8:
grep-invalid-utf8 () {
perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
find | grep-invalid-utf8
You can check if they make more sense in another locale with recode or iconv:
find | grep-invalid-utf8 | recode latin1..utf8
find | grep-invalid-utf8 | iconv -f latin1 -t utf8
Once you've determined that a bunch of file names are in a certain encoding (e.g. latin1), one way to rename them is
find | grep-invalid-utf8 |
rename 'BEGIN {binmode STDIN, ":encoding(latin1)"; use Encode;}
$_=encode("utf8", $_)'
This uses the perl rename command available on Debian and Ubuntu. You can pass it -n
to show what it would be doing without actually renaming the files.