How to see the file size history of a single file in a git repository?

You can use either git ls-tree -r -l <revision> <path> to get the blob size at given revision, e.g.

$ git ls-tree -r -l v1.6.0 gitweb/README
100644 blob 825162a0b6dce8c354de67a30abfbad94d29fdde   16067    gitweb/README

From the docs:

 -r                    recurse into subtrees
 -l, --long            include object size

The blob size in this example is '16067'. The disadvantage of this solution is that git ls-tree can process only one revision at once.

You can use instead git cat-file --batch-check < <list-of-objects> instead, feeding it blob identifiers. If location of file didn't change through history (file was not moved), you can use git rev-list <starting-point> -- <path> to get list of revisions touching given path, translate them into names of blobs using <revision>:<path> extended SHA-1 syntax (see git-rev-parse manpage), and feed it to git cat-file. Example:

$ git rev-list -5 v1.6.0 -- gitweb/README | 
  sed -e 's/$/:gitweb\/README/g' |
  git cat-file --batch-check
825162a0b6dce8c354de67a30abfbad94d29fdde blob 16067
6908036402ffe56c8b0cdcebdfb3dfacf84fb6f1 blob 16011
356ab7b327eb0df99c0773d68375e155dbcea0be blob 14248
8f7ea367bae72ea3ce25b10b968554f9b842fffe blob 13853
8dfe335f73c223fa0da8cd21db6227283adb95ba blob 13801

Create a file called .gitattributes and add the following line:

main.js -diff

This turns off line-based diffs for main.js. Now run the following command:

git log --stat main.js

The log will include lines like

main.js | Bin 4316 -> 4360 bytes

After you're done, you should probably delete .gitattributes. I don't know what other changes in git's behavior may be caused by the -diff attribute.

Tested with git versions 1.7.12.4 and 1.7.9.5.

Source: ewall's answer and https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_marking_files_as_binary


Here is a Bash function that will report the size over time in the following format.

 LoC  Date                       Commit ID   Subject
 942  2019-08-31 18:09:34 +0200  35fc67c122  Declare some XML namespaces in replacement of OGCPrefixMapper, which has been removed from Apache SIS. https://issues.apache.org/jira/browse/SIS-126
 943  2019-08-09 16:52:29 +0200  e8438ab869  fix(GML): fix relative path resolving inside a jar
 934  2019-08-05 15:37:46 +0200  1e0c0b03c4  fix(GML): fix all test cases
 932  2019-07-30 15:54:53 +0200  fddea5db24  feat(GML): work on fallback for non-xsd Feature store
 932  2019-07-23 16:40:23 +0200  8d9a6a7dd0  feat(GML): improve support for custom XML mappings
 932  2019-06-26 15:18:43 +0200  43ea6e0bd7  feat(GML): add concurrency support for read/write operations
 932  2019-06-21 09:27:41 +0200  07a9993b4b  feat(GML): support group reference min/max occurs attributes
 932  2019-06-21 09:27:41 +0200  352a9104ae  feat(GML): fix resolving local files xsd paths
 919  2018-06-08 15:41:26 +0200  01ac7538e7  Merge branch 'master' into sis-migration
 919  2018-05-16 16:40:04 +0200  16fe7590c5  fix(JAXP): various fix for  WFS 2.0.0
 912  2018-04-11 10:09:22 +0200  bf3a38bdc4  chore(*): update JTS version 1.15.0
 912  2017-11-09 20:15:23 +0100  bc14dc4be1  fix(Client): fix minor problems on WFS querying
 901  2017-10-20 11:41:43 +0200  f686d7ff15  feat(Storage): add support for GML 2.1.2
 882  2017-05-16 23:07:31 +0200  f20c34c1e2  refactor(Feature): renamed the Geotk flavor of org.apache.sis.feature package as org.geotoolkit.feature.

Here is the function:

git-log-size() {
    git rev-list HEAD -- "$1" | while read cid; do
        git cat-file blob "$cid:$1" | wc -l | tr -d '\n'
        echo -n $'\t'
        git log -1 "--pretty=%ci%x09%h%x09%s" $cid
    done | column -t -s$'\t'
}

It is not particularly efficient, but does the job. It uses some utilities which are pretty common (wc, tr, column).

The size is reported as lines of code (LoC) since this is the common metric in software development, just change the "-l" option of wc if you prefer something else.

Here is how to call it:

git-log-size <path>

You could create a script that uses the output from git show --pretty=raw <commit> to obtain the tree, then uses git ls-tree -r -l to obtain the blob you are looking for, including the file size.

In case you have ruby and the grit gem installed, here's a little script I threw together:

require 'grit'

if ARGV.size < 1
  puts 'usage: file-size FILE'
  puts 'run from within the git repo root'
  exit
end

filename = ARGV[0].to_s

repo = Grit::Repo.new('.')
commits = repo.log('master', filename)
commits.each do |commit|
  blob = commit.tree/filename
  puts "#{commit} #{blob.size} bytes"
end

Example usage (filename of script is file-size.rb), will show you the history for somedir/somefile:

myproject$ ruby file-size.rb somedir/somefile

Tags:

Git