How do I remove non-ASCII characters from filenames?
I believe this will work...
$Files = gci | where {$_.Name -match "[^\u0020-\u007F]"}
$Files | ForEach-Object {
$OldName = $_.Name
$NewName = $OldName -replace "[^\u0020-\u007F]", "_"
ren $_ $NewName
}
I don't have that range of ASCII filenames to test against though.
I found a similar topic here on Stack Overflow.
With the following code most of the characters will be translated to their "closest character". Although i couldn't get the ’
translated. (Maybe it does, i can't make a filename in the prompt with it ;) The ß
also does not get translated.
function Remove-Diacritics {
param ([String]$src = [String]::Empty)
$normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
$sb = new-object Text.StringBuilder
$normalized.ToCharArray() | % {
if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
[void]$sb.Append($_)
}
}
$sb.ToString()
}
$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
$newname = Remove-Diacritics $_.Name
if ($_.Name -ne $newname) {
$num=1
$nextname = $_.Fullname.replace($_.Name,$newname)
while(Test-Path -Path $nextname)
{
$next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
$nextname = $_.Fullname.replace($_.Name,$next)
$num+=1
}
echo $nextname
ren $_.Fullname $nextname
}
}
Edit:
I added some code to check if a filename already exists and add (1)
, (2)
etc... if it does. (It's not smart enough to detect an already existing (1)
in the filename to be renamed so in that case you would get (1) (1)
. But as always... everything is programmable ;)
Edit 2:
Here is the last one for tonight...
This one has a different function for replacing the characters. Also added a line to change unknown characters like ß
and ┤
for example to _
.
function Convert-ToLatinCharacters {
param([string]$inputString)
[Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($inputString))
}
$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
$newname = Convert-ToLatinCharacters $_.Name
$newname = $newname.replace('?','_')
if ($_.Name -ne $newname) {
$num=1
$nextname = $_.Fullname.replace($_.Name,$newname)
while(Test-Path -Path $nextname)
{
$next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
$nextname = $_.Fullname.replace($_.Name,$next)
$num+=1
}
echo $nextname
ren $_.Fullname $nextname
}
}