Similarity Score - Levenshtein

You can download Apache Commons StringUtils and investigate (and maybe use) their implementation of Levenshtein distance algorithm.


The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (Wikipedia)

  • So a Levenshtein distance of 0 means: both strings are equal
  • The maximum Levenshtein distance (all chars are different) is max(string1.length, string2.length)

So if you need a percentage, you have to use this to points to scale. For example:

"Hallo", "Hello" -> Levenstein distance 1 Max Levenstein distance for this two strings is: 5. So the 20% of the characters do not match.

String s1 = "Hallo";
String s2 = "Hello";
int lfd = calculateLevensteinDistance(s1, s2);
double ratio = ((double) lfd) / (Math.max(s1.length, s2.length));