Replacing umlauts in JS
You need to first figure out what the character codes are that you're trying to replace. For example, depending on the character encoding, the characters could be in 8859, UTF-8 or something else. They could also be character symbols such as "ä"
Rather than guessing, print them out.
And beware that your incoming data may not use the same character set/character encoding consistently--you need to check on where the data is coming from.
So look at the incoming data by using string. charCodeAt
Check the character code before the toLowerCase
to ensure that it is not changing things on you. You'll need to debug step by step.
Finally, check the character set settings in your editor to ensure that your typed ä is what it should be. You may want to specify it via the UTF8 value rather than typing ä, ö etc
Either ensure that your script's encoding is correctly specified (in <script>
tag or in page's header/meta if it's embedded) or specify symbols with \uNNNN
syntax that will always unambiguously resolve to some specific Unicode codepoint.
For example:
str.replace(/\u00e4/g, "ae")
Will always replace ä with ae, no matter what encoding is set for your page/script, even if it is incorrect.
Here are the codes needed for Germanic languages:
// Ü, ü \u00dc, \u00fc
// Ä, ä \u00c4, \u00e4
// Ö, ö \u00d6, \u00f6
// ß \u00df
Here's a function that replaces most common chars to produce a Google friendly SEO url:
function deUmlaut(value){
value = value.toLowerCase();
value = value.replace(/ä/g, 'ae');
value = value.replace(/ö/g, 'oe');
value = value.replace(/ü/g, 'ue');
value = value.replace(/ß/g, 'ss');
value = value.replace(/ /g, '-');
value = value.replace(/\./g, '');
value = value.replace(/,/g, '');
value = value.replace(/\(/g, '');
value = value.replace(/\)/g, '');
return value;
}
If you are looking to replace the German Umlaute with cleverly respecting the case, use this (opensource, happy to share, all by me):
const umlautMap = {
'\u00dc': 'UE',
'\u00c4': 'AE',
'\u00d6': 'OE',
'\u00fc': 'ue',
'\u00e4': 'ae',
'\u00f6': 'oe',
'\u00df': 'ss',
}
function replaceUmlaute(str) {
return str
.replace(/[\u00dc|\u00c4|\u00d6][a-z]/g, (a) => {
const big = umlautMap[a.slice(0, 1)];
return big.charAt(0) + big.charAt(1).toLowerCase() + a.slice(1);
})
.replace(new RegExp('['+Object.keys(umlautMap).join('|')+']',"g"),
(a) => umlautMap[a]
);
}
const test = ['Übung', 'ÜBUNG', 'üben', 'einüben', 'EINÜBEN', 'Öde ätzende scheiß Übung']
test.forEach((str) => console.log(str + " -> " + replaceUmlaute(str)))
It will:
- Übung -> Uebung
- ÜBUNG -> UEBUNG
- üben -> ueben
- einüben -> einueben
- EINÜBEN -> EINUEBEN
- and the same for Ä, Ö
- and simple ß -> ss