How to remove invalid characters from filenames?
Solution 1:
One way would be with sed:
mv 'file' $(echo 'file' | sed -e 's/[^A-Za-z0-9._-]/_/g')
Replace file
with your filename, of course. This will replace anything that isn't a letter, number, period, underscore, or dash with an underscore. You can add or remove characters to keep as you like, and/or change the replacement character to anything else, or nothing at all.
Solution 2:
I assume you are on Linux box and the files were made on a Windows box. Linux uses UTF-8 as the character encoding for filenames, while Windows uses something else. I think this is the cause of the problem.
I would use "convmv". This is a tool that can convert filenames from one character encoding to another. For Western Europe one of these normally works:
convmv -r -f windows-1252 -t UTF-8 .
convmv -r -f ISO-8859-1 -t UTF-8 .
convmv -r -f cp-850 -t UTF-8 .
If you need to install it on a Debian based Linux you can do so by running:
sudo apt-get install convmv
It works for me every time and it does recover the original filename.
Source: LeaseWebLabs
Solution 3:
I had some japanese files with broken filenames recovered from a broken usb stick and the solutions above didn't work for me.
I recommend the detox package:
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It'll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.
Example usage:
detox -r -v /path/to/your/files
-r Recurse into subdirectories -v Be verbose about which files are being renamed -n Can be used for a dry run (only show what would be changed)
Solution 4:
I assume you mean you want to traverse the filesystem and fix all such files?
Here's the way I'd do it
find /path/to/files -type f -print0 | \
perl -n0e '$new = $_; if($new =~ s/[^[:ascii:]]/_/g) {
print("Renaming $_ to $new\n"); rename($_, $new);
}'
That would find all files with non-ascii characters and replace those characters with underscores (_
). Use caution though, if a file with the new name already exists, it'll overwrite it. The script can be modified to check for such a case, but I didnt put that in to keep it simple.
Solution 5:
Following answers at https://stackoverflow.com/questions/2124010/grep-regex-to-match-non-ascii-characters, You can use:
rename 's/[^\x00-\x7F]//g' *
where *
matches the files you want to rename. If you want to do it over multiple directories, you can do something like:
find . -exec rename 's/[^\x00-\x7F]//g' "{}" \;
You can use the -n argument to rename
to do a dry run, and see what would be changed, without changing it.