Quantifying the amount of change in a git diff?
git diff --word-diff works in the latest stable version of git (at git-scm.com)
There are a few options that let you decide what format you want it in, the default is quite readable but you might want --word-diff=porcelain if you're feeding the output into a script.
wdiff does word-by-word comparison. Git can be configured to use an external program to do the diffing. Based on those two facts and this blog post, the following should do roughly what you want.
Create a script to ignore most of the unnecessary arguments that git-diff
provides and pass them to wdiff
. Save the following as ~/wdiff.py
or something similar and make it executable.
#!/usr/bin/python
import sys
import os
os.system('wdiff -s3 "%s" "%s"' % (sys.argv[2], sys.argv[5]))
Tell git
to use it.
git config --global diff.external ~/wdiff.py
git diff filename
Building on James' and cornmacrelf's input, I've added arithmetic expansion, and came up with a few reusable alias commands for counting words added, deleted, and duplicated in a git diff:
alias gitwa='git diff --word-diff=porcelain origin/master | grep -e "^+[^+]" | wc -w | xargs'
alias gitwd='git diff --word-diff=porcelain origin/master | grep -e "^-[^-]" | wc -w | xargs'
alias gitwdd='git diff --word-diff=porcelain origin/master |grep -e"^+[^+]" -e"^-[^-]"|sed -e's/.//'|sort|uniq -d|wc -w|xargs'
alias gitw='echo $(($(gitwa) - $(gitwd)))'
Output from gitwa
and gitwd
is trimmed using xargs trick.
Words duplicated added from Miles' answer.
I figured out a way to get concrete numbers by building on top of the other answers here. The result is an approximation, but it should be close enough to serve as a useful indicator of the amount characters that were added or removed. Here's an example with my current branch compared to origin/master:
$ git diff --word-diff=porcelain origin/master | grep -e '^+[^+]' | wc -m
38741
$ git diff --word-diff=porcelain origin/master | grep -e '^-[^-]' | wc -m
46664
The difference between the removed characters (46664
) and the added characters (38741
) shows that my current branch has removed approximately 7923
characters. Those individual added/removed counts are inflated due to the diff's +
/-
and indentation characters, however, the difference should cancel out a significant portion of that inflation in most cases.