What is a simple fuzzy string matching algorithm in Python?

Take a look at this python library, which SeatGeek open-sourced yesterday. Obviously most of these kinds of problems are very context dependent, but it might help you.

from fuzzywuzzy import fuzz

s1 = "the quick brown fox"
s2 = "the quick brown fox jumped over the lazy dog"
s3 = "the fast fox jumped over the hard-working dog"

fuzz.partial_ratio(s1, s2)
> 100

fuzz.token_set_ratio(s2, s3)
> 73

SeatGeek website

and Github repo

If all you want to do is to test whether or not all the words in a string match another string, that's a one liner:

if not [word for word in b.split(' ') if word not in a.split(' ')]:
    print 'Match!'

If you want to score them instead of a binary test, why not just do something like:

((# of matching words) / (# of words in bigger string)) * ((# of words in smaller string) / (# of words in bigger string))

If you wanted to, you could get fancier and do fuzzy match on each string.

I like Drew's answer.

You can use difflib to find the longest match:

>>> a = 'The quick brown fox.'
>>> b = 'The quick brown fox jumped over the lazy dog.'
>>> import difflib
>>> s = difflib.SequenceMatcher(None, a, b)
>>> s.find_longest_match(0,len(a),0,len(b))
Match(a=0, b=0, size=19) # returns NamedTuple (new in v2.6)

Or pick some minimum matching threshold. Example:

>>> difflib.SequenceMatcher(None, a, b).ratio()
0.61538461538461542

What is a simple fuzzy string matching algorithm in Python?

Tags:

Python

Related

Recent Posts