How to convert arbitrary string to Java identifier?

This simple method will convert any input string into a valid java identifier:

public static String getIdentifier(String str) {
    try {
        return Arrays.toString(str.getBytes("UTF-8")).replaceAll("\\D+", "_");
    } catch (UnsupportedEncodingException e) {
        // UTF-8 is always supported, but this catch is required by compiler
        return null;
    }
}

Example:

"%^&*\n()" --> "_37_94_38_42_10_56_94_40_41_"

Any input characters whatsoever will work - foreign language chars, linefeeds, anything!
In addition, this algorithm is:

  • reproducible
  • unique - ie will always and only produce the same result if str1.equals(str2)
  • reversible

Thanks to Joachim Sauer for the UTF-8 suggestion


If collisions are OK (where it is possible for two inputs strings to produce the same result), this code produces a readable output:

public static String getIdentifier(String str) {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < str.length(); i++) {
        if ((i == 0 && Character.isJavaIdentifierStart(str.charAt(i))) || (i > 0 && Character.isJavaIdentifierPart(str.charAt(i))))
            sb.append(str.charAt(i));
        else
            sb.append((int)str.charAt(i));
    }
    return sb.toString();
}

It preserves characters that are valid identifiers, converting only those that are invalid to their decimal equivalents.


If you are doing this for autogenerated code (i.e. don't care much about readability) one of my favorites is just to Base64 it. No need to play language lawyer over what characters are valid in what encodings, and it's a pretty common way of "protecting" arbitrary byte data.


I dont't know a tool for that purpose, but it can be easily created using the Character class.

Did you know that string€with_special_characters___ is a legal java identifier?

public class Conv {
    public static void main(String[] args) {
        String[] idents = { "string with spaces", "100stringsstartswithnumber",
                "string€with%special†characters/\\!", "" };
        for (String ident : idents) {
            System.out.println(convert(ident));
        }
    }

    private static String convert(String ident) {
        if (ident.length() == 0) {
            return "_";
        }
        CharacterIterator ci = new StringCharacterIterator(ident);
        StringBuilder sb = new StringBuilder();
        for (char c = ci.first(); c != CharacterIterator.DONE; c = ci.next()) {
            if (c == ' ')
                c = '_';
            if (sb.length() == 0) {
                if (Character.isJavaIdentifierStart(c)) {
                    sb.append(c);
                    continue;
                } else
                    sb.append('_');
            }
            if (Character.isJavaIdentifierPart(c)) {
                sb.append(c);
            } else {
                sb.append('_');
            }
        };
        return sb.toString();
    }
}

Prints

string_with_spaces
_100stringsstartswithnumber
string€with_special_characters___
_