Pandas fuzzy merge/match name column, with duplicates
Here's a bit more pythonic (in my view), working (on your example) code, without explicit loops:
def get_donors(row):
d = donors.apply(lambda x: fuzz.ratio(x['name'], row['name']) * 2 if row['Email'] == x['Email'] else 1, axis=1)
d = d[d >= 75]
if len(d) == 0:
v = ['']*3
else:
v = donors.ix[d.idxmax(), ['name','Email','Date']].values
return pd.Series(v, index=['donor name', 'donor email', 'donor date'])
pd.concat((fundraisers, fundraisers.apply(get_donors, axis=1)), axis=1)
Output:
Date Email name donor name donor email donor date
0 2013-03-27 10:00:00 [email protected] John Doe John Doe [email protected] 2013-03-01 10:39:00
1 2013-03-01 10:39:00 [email protected] John Doe John Doe [email protected] 2013-03-01 10:39:00
2 2013-03-02 10:39:00 [email protected] Kathy test Kat test [email protected] 2013-03-27 10:39:00
3 2013-03-03 10:39:00 [email protected] Tes Ester
4 2013-03-04 10:39:00 [email protected] Jane Doe Jane Doe [email protected] 2013-03-04 10:39:00