Where can I find the patience diff implemented?

It's perhaps not as ideal as I'd like it, but the solution is perfectly good from a practical perspective (and thats a damn good perspective to have).

git diff --no-index --patience file1 file2 does the job. (thanks @StevenPenny)

$P4DIFF variable defines the external diff... we just stuff git diff --patience --no-index into that.


For a javascript implementation, with enhancements to PatienceDiff to determine the lines that likely moved (dubbed "PatienceDiffPlus"), see https://github.com/jonTrent/PatienceDiff.


I took the liberty of porting patience to a somewhat standalone library, it's in C#. It's still early-days for the library. It is mostly a line-by-line port; so it hopefully has most of the stability of the Python one.

Remember that patience only finds the longest common subsequences (in diff terms that means parts of the file that have not changed). You will need to determine the additions and removals yourself.

Also remember that within the Bazaar repository there are also implementations in Python and in C (again, the implementations only solve the LCS problem):

  • C version: seems to value performance over clarity, you won't easily be able to understand the algorithm by reading this. There is also a lot of code-overhead for Python interop.
  • Python version: the reference implementation of the algorithm. Seems to mostly value clarity over performance.

If you need to write your own implementation I would recommend porting the Python version first, then looking at the C implementation for tips on how to speed it up.

There should also be an implementation in the Git repository, but I haven't searched for it.


Cohen's own Python implementation needs only minor tweaks (below) to run standalone. It's in two files, copies of which I snagged by googling "difflib patience":

http://stuff.mit.edu/afs/athena/system/i386_deb50/os/usr/share/pyshared/bzrlib/patiencediff.py and http://stuff.mit.edu/afs/athena/system/i386_deb50/os/usr/share/pyshared/bzrlib/_patiencediff_py.py

The first file can be run from the command line roughly like diff. The second is the Python implementation of the inner loops. (Single file?? Exercise for reader!) In bzrlib there's also a C implementation of the inner loops.

Here (with the help of the program itself) are my patches to make them run standalone:

Sandy$ patiencediff.py --patience orig/patiencediff.py patiencediff.py
--- orig/patiencediff.py
+++ patiencediff.py
@@ -15,14 +15,20 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

+try:
+    from bzrlib.lazy_import import lazy_import
+    lazy_import(globals(), """
+    import os
+    import sys
+    import time
+    import difflib
+    """)
+except:
+    import os
+    import sys
+    import time
+    import difflib

-from bzrlib.lazy_import import lazy_import
-lazy_import(globals(), """
-import os
-import sys
-import time
-import difflib
-""")


 __all__ = ['PatienceSequenceMatcher', 'unified_diff', 'unified_diff_files']
@@ -135,11 +141,18 @@
         PatienceSequenceMatcher_c as PatienceSequenceMatcher
         )
 except ImportError:
-    from bzrlib._patiencediff_py import (
-        unique_lcs_py as unique_lcs,
-        recurse_matches_py as recurse_matches,
-        PatienceSequenceMatcher_py as PatienceSequenceMatcher
-        )
+    try:
+        from bzrlib._patiencediff_py import (
+            unique_lcs_py as unique_lcs,
+            recurse_matches_py as recurse_matches,
+            PatienceSequenceMatcher_py as PatienceSequenceMatcher
+            )
+    except ImportError:
+        from _patiencediff_py import (
+            unique_lcs_py as unique_lcs,
+            recurse_matches_py as recurse_matches,
+            PatienceSequenceMatcher_py as PatienceSequenceMatcher
+            )


 def main(args):
Sandy$ patiencediff.py --patience orig/_patiencediff_py.py _patiencediff_py.py
--- orig/_patiencediff_py.py
+++ _patiencediff_py.py
@@ -15,11 +15,16 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

-
+from __future__ import print_function
 from bisect import bisect
 import difflib

-from bzrlib.trace import mutter
+try:
+    from bzrlib.trace import mutter
+except:
+    import sys
+    def mutter(msg):
+        print (msg, file=sys.stderr)


 __all__ = ['PatienceSequenceMatcher', 'unified_diff', 'unified_diff_files']
Sandy$