the correct regex for replacing em-dash with a basic "-" in java

It works fine for me. My guess is you're not using an em-dash. Test copy-pasting the em-dash character from the character map instead of word.


Based on what you posted, the problem may not actually lie with your code, but with your assumed dash. What you have looks like an en dash (width of a capital N) rather than an em dash (width of a capital M). The Unicode for the en dash is U+2013, try using that instead and see if it updates properly.


String.replaceAll takes a regex as its first parameter. If you just want to replace all occurences of a single char by another char, consider using String.replace(char, char):

String s = "asd – asd";
s = s.replace('\u2014', '-');

Minor edit after question edit:

You might not be using an em-dash at all. If you're not sure what you have, a nice solution is to simply find and replace all dashes... em or otherwise. Take a look at this answer, you can try to use the Unicode dash punctuation property for all dashes ==> \\p{Pd}

String s = "asd – asd";
s = s.replaceAll("\\p{Pd}", "-");

Working example replacing an em dash and regular dash both with the above code.

References:
public String replaceAll(String regex, String replacement)
Unicode Regular Expressions

Tags:

Java