Finding if two strings are almost similar

You can use difflib.sequencematcher if you want something from the stdlib:

from difflib import SequenceMatcher
s_1 = 'Mohan Mehta'
s_2 = 'Mohan Mehte'
print(SequenceMatcher(a=s_1,b=s_2).ratio())
0.909090909091

fuzzywuzzy is one of numerous libs that you can install, it uses the difflib module with python-Levenshtein. You should also check out the wikipage on Approximate_string_matching


Another approach is to use a "phonetic algorithm":

A phonetic algorithm is an algorithm for indexing of words by their pronunciation.

For example using the soundex algorithm:

>>> import soundex
>>> s = soundex.getInstance()
>>> s.soundex("Umesh Gupta")
'U5213'
>>> s.soundex("Umash Gupte")
'U5213'
>>> s.soundex("Umesh Gupta") == s.soundex("Umash Gupte")
True

you might want to look at NLTK (The Natural Language Toolkit), specifically the nltk.metrics package, which implements various string distance algorithms, including the Levenshtein distance mentioned already.


What you want is a string distance. There many flavors, but I would recommend starting with the Levenshtein distance.