Text difference algorithm

I can recommend to take a look at Neil Fraser's code and articles:

google-diff-match-patch

Currently available in Java, JavaScript, C++ and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

Neil Fraser: Diff Strategies - for theory and implementation notes


In Python, there is difflib, as also others have suggested.

difflib offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:

def text_compare(text1, text2, isjunk=None):
    return difflib.SequenceMatcher(isjunk, text1, text2).ratio()

Tags:

Python

C#

Diff