Split string into key-value pairs

Using Guava library it's a one-liner:

String test = "pet:cat::car:honda::location:Japan::food:sushi";
Map<String, String> map = Splitter.on( "::" ).withKeyValueSeparator( ':' ).split( test );
System.out.println(map);

The output:

{pet=cat, car=honda, location=Japan, food=sushi}

This also might work faster than JDK String.split as it does not create a regexp for "::".

Update it even handles correctly the corner case from the comments:

String test = "pet:cat::car:honda::location:Japan::food:sushi:::cool";
Map<String, String> map = Splitter.on( "::" ).withKeyValueSeparator( ':' ).split( test );
System.out.println(map);

The output is:

{pet=cat, car=honda, location=Japan, food=sushi, =cool}

Your solution is indeed somewhat inefficient.

The person who gave you the string to parse is also somewhat of a clown. There are industry standard serialization formats, like JSON or XML, for which fast, efficient parses exist. Inventing the square wheel is never a good idea.

First question: Do you care? Is it slow enough that it hinders performance of your application? It's likely not to, but there is only one way to find out. Benchmark your code.

That said, more efficient solutions exist. Below is an example

public static void main (String[] args) throws java.lang.Exception
{
    String test = "pet:cat::car:honda::location:Japan::food:sushi";
    boolean stateiskey = true;

    Map<String, String> map = new HashMap<>();
    int keystart = 0;
    int keyend = 0;
    int valuestart = 0;
    int valueend = 0;

    for(int i = 0; i < test.length(); i++){
        char nextchar = test.charAt(i);
        if (stateiskey) {
            if (nextchar == ':') {
              keyend = i;           
              stateiskey = false;
              valuestart = i + 1;
            }
        } else {
            if (i == test.length() - 1 || (nextchar == ':' && test.charAt(i + 1) == ':')) {
                valueend = i;
                if (i + 1 == test.length()) valueend += 1; //compensate one for the end of the string
                String key = test.substring(keystart, keyend);
                String value = test.substring(valuestart, valueend);
                keystart = i + 2;
                map.put(key, value);
                i++;
                stateiskey = true;
            }
        }
    }

    System.out.println(map);
}

This solution is a finite state machine with only two states. It looks at every character only twice, once when it tests it for a boundary, and once when it copies it to the new string in your map. This is the minimum amount.

It doesn't create objects that are not needed, like stringbuilders, strings or arrays, this keeps collection pressure low.

It maintains good locality. The next character probably always is in cache, so the lookup is cheap.

It comes at a grave cost that is probably not worth it though:

  • It's far more complicated and less obvious
  • There are all sorts of moving parts
  • It's harder to debug when your string is in an unexpected format
  • Your coworkers will hate you
  • You will hate you when you have to debug something

Worth it? Maybe. How fast do you need that string parsed exactly?

A quick and dirty benchmark at https://ideone.com/8T7twy tells me that for this string, this method is approximately 4 times faster. For longer strings the difference is likely somewhat greater.

But your version is still only 415 milliseconds for 100.000 repetitions, where this one is 99 milliseconds.


You could do a single call to split() and a single pass on the String using the following code. But it of course assumes the String is valid in the first place:

    Map<String, String> map = new HashMap<String, String>();
    String test = "pet:cat::car:honda::location:Japan::food:sushi";

    // split on ':' and on '::'
    String[] parts = test.split("::?");

    for (int i = 0; i < parts.length; i += 2) {
        map.put(parts[i], parts[i + 1]);
    }

    for (String s : map.keySet()) {
        System.out.println(s + " is " + map.get(s));
    }

The above is probably a little bit more efficient than your solution, but if you find your code clearer, then keep it, because there is almost zero chance such an optimization has a significant impact on performance, unless you do that millions of times. Anyway, if it's so important, then you should measure and compare.

EDIT:

for those who wonder what ::? means in the above code: String.split() takes a regular expression as argument. A separator is a substring that matches the regular expression. ::? is a regular expression which means: 1 colon, followed by 0 or 1 colon. It thus allows considering :: and : as separators.

Tags:

Java

Hashmap