Easy way to remove accents from a Unicode string?
I guess the only difference is that I use a +
and not a []
compared to the solution. I think both works, but it's better to have it here as well.
String normalized = Normalizer.normalize(input, Normalizer.Form.NFD);
String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
Finally, I've solved it by using the Normalizer
class.
import java.text.Normalizer;
public static String stripAccents(String s)
{
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}
Maybe the easiest and safest way is using StringUtils
from Apache Commons Lang
StringUtils.stripAccents(String input)
Removes diacritics (~= accents) from a string. The case will not be altered. For instance, 'à' will be replaced by 'a'. Note that ligatures will be left as is.
StringUtils.stripAccents()