Quantifying the amount of change in a git diff?

git diff --word-diff works in the latest stable version of git (at git-scm.com)

There are a few options that let you decide what format you want it in, the default is quite readable but you might want --word-diff=porcelain if you're feeding the output into a script.


wdiff does word-by-word comparison. Git can be configured to use an external program to do the diffing. Based on those two facts and this blog post, the following should do roughly what you want.

Create a script to ignore most of the unnecessary arguments that git-diff provides and pass them to wdiff. Save the following as ~/wdiff.py or something similar and make it executable.

#!/usr/bin/python

import sys
import os

os.system('wdiff -s3 "%s" "%s"' % (sys.argv[2], sys.argv[5]))

Tell git to use it.

git config --global diff.external ~/wdiff.py
git diff filename

Building on James' and cornmacrelf's input, I've added arithmetic expansion, and came up with a few reusable alias commands for counting words added, deleted, and duplicated in a git diff:

alias gitwa='git diff --word-diff=porcelain origin/master | grep -e "^+[^+]" | wc -w | xargs'
alias gitwd='git diff --word-diff=porcelain origin/master | grep -e "^-[^-]" | wc -w | xargs'
alias gitwdd='git diff --word-diff=porcelain origin/master |grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs'

alias gitw='echo $(($(gitwa) - $(gitwd)))'

Output from gitwa and gitwd is trimmed using xargs trick.

Words duplicated added from Miles' answer.


I figured out a way to get concrete numbers by building on top of the other answers here. The result is an approximation, but it should be close enough to serve as a useful indicator of the amount characters that were added or removed. Here's an example with my current branch compared to origin/master:

$ git diff --word-diff=porcelain origin/master | grep -e '^+[^+]' | wc -m
38741
$ git diff --word-diff=porcelain origin/master | grep -e '^-[^-]' | wc -m
46664

The difference between the removed characters (46664) and the added characters (38741) shows that my current branch has removed approximately 7923 characters. Those individual added/removed counts are inflated due to the diff's +/- and indentation characters, however, the difference should cancel out a significant portion of that inflation in most cases.

Tags:

Git

Word Count