Regex Help- Escaping Characters And Matcher Method
The simplest solution is to use the \W
character as your entire expression, it matches any non-word character. Unfortunately, this solution would also match any whitespace characters, and ignore the underscore character (_
).
Here's the expression I would use:
(_|[^\w\s])
What does it mean?
1st Capturing Group
(_|[^\w\s])
Matches either alternative- 1st Alternative
_
_
matches the character_
literally (case sensitive)
- 2nd Alternative
[^\w\s]
- Match a single character not present in the list below
[^\w\s]
\w
matches any word character (equal to[a-zA-Z0-9_]
)\s
matches any whitespace character (equal to[\r\n\t\f\v ]
)
- Match a single character not present in the list below
- 1st Alternative
Here are some examples:
String expression = '(_|[^\\w\\s])';
String allPunctuation = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
String input1 = 'This is a test!', output1 = 'This is a test';
String input2 = 'This is a test...', output2 = 'This is a test';
String input3 = '([{This_is_a_test}])', output3 = 'Thisisatest';
system.assertEquals('', allPunctuation.replaceAll(expression, ''));
system.assertEquals(output1, input1.replaceAll(expression, ''));
system.assertEquals(output2, input2.replaceAll(expression, ''));
system.assertEquals(output3, input3.replaceAll(expression, ''));
Given example 3, you may want to change things up and replace underscores with space characters instead. Then you could simplify somewhat:
String sanitize(String name)
{
if (name == null) return name;
return name.replaceAll('[^\\w\\s]', '')
.replaceAll('_', ' ').trim();
}
String allPunctuation = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
String input1 = 'This is a test! ', output1 = 'This is a test';
String input2 = 'This is a test... ', output2 = 'This is a test';
String input3 = '([{This_is_a_test}])', output3 = 'This is a test';
system.assertEquals('', sanitize(allPunctuation));
system.assertEquals(output1, sanitize(input1));
system.assertEquals(output2, sanitize(input2));
system.assertEquals(output3, sanitize(input3));
You can match all punctuation using \\p{Punct}
, as mentioned in the Pattern class, which matches:
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
For example, the following code results in an empty String:
String s = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
System.debug(s.replaceAll('\\p{Punct}',''));
Note that the "escapes" are not disappearing, they're being compiled. If you want a literal backslash escape, you have to escape it twice:
String s = '~!@#$%^*()_+|}{":?><`=;/.,][-\\\'\\\\';
Where \\\'
results in the pattern/matcher/regexp engine seeing \'
, and \\\\
results in the engine seeing \\
.
Adrian's solution also works, but I think that \p{Punct}
is a bit more explicit with declaring the intent of your code (to match any punctuation).