How to remove special characters from a string?

As described here http://developer.android.com/reference/java/util/regex/Pattern.html

Patterns are compiled regular expressions. In many cases, convenience methods such as String.matches, String.replaceAll and String.split will be preferable, but if you need to do a lot of work with the same regular expression, it may be more efficient to compile it once and reuse it. The Pattern class and its companion, Matcher, also offer more functionality than the small amount exposed by String.

public class RegularExpressionTest {

public static void main(String[] args) {
    System.out.println("String is = "+getOnlyStrings("!&(*^*(^(+one(&(^()(*)(*&^%$#@!#$%^&*()("));
    System.out.println("Number is = "+getOnlyDigits("&(*^*(^(+91-&*9hi-639-0097(&(^("));
}

 public static String getOnlyDigits(String s) {
    Pattern pattern = Pattern.compile("[^0-9]");
    Matcher matcher = pattern.matcher(s);
    String number = matcher.replaceAll("");
    return number;
 }
 public static String getOnlyStrings(String s) {
    Pattern pattern = Pattern.compile("[^a-z A-Z]");
    Matcher matcher = pattern.matcher(s);
    String number = matcher.replaceAll("");
    return number;
 }
}

Result

String is = one
Number is = 9196390097

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");


If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").

A third way could be something like this, if you can exactly define what should be left in your string:

String  result = yourString.replaceAll("[^\\w\\s]","");

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String  result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

Additional information on Unicode

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.


This will replace all the characters except alphanumeric

replaceAll("[^A-Za-z0-9]","");

Tags:

Java

Regex