Understanding git gc --auto
One of the main points of gc --auto
is that it should be very quick, so other commands can frequently call it “just in case”. To achieve that, the object count is only guessed. As git help config
says under gc.auto
:
When there are approximately more than this many loose objects in the repository […]
Looking at the code (too_many_loose_objects()
in buildin/gc.c
), here’s what happens:
- The gc.auto is divided by 256 and rounded up
- The folder that contains all the objects that start with
17
is opened - It is checked if the folder contains more objects than the result of step 1
This works fine, since SHA-1 is evenly distributed, so “all the objects that start with X” is representative for the whole set. But of course this only works for a big big amount of objects. To lazy to do the maths, I would guess at least >3000. With 6700 (the default value of gc.auto
), this should already work quite reliably.
The core question for me is why you need such a low setting and whether it is important that this really runs at 250 objects. With a setting of 250, gc
will run as soon as you have 2 loose objects that start with 17
. The chance that this happens is > 80%
for 600 objects and > 90%
for 800 objects.
Update: Couldn’t help it – had to do the math :). I was wondering how well that estimation system would work. Here’s a plot of the results. For any given gc.auto
, how high is the probability that gc
will start when there are gc.auto
(red) / gc.auto * 1.1
(green) / gc.auto * 1.2
(orange) / gc.auto * 1.5
(blue) / gc.auto * 2
(purple) loose objects in the repo?
Note that gc auto
is be more robust in Git 2.12.2 (released March 2017, two days ago).
See commit a831c06 (10 Feb 2017) by David Turner (csusbdt
).
Helped-by: Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit d30ec1b, 21 Mar 2017)
gc
: ignore oldgc.log
filesA server can end up in a state where there are lots of unreferenced loose objects (say, because many users are doing a bunch of rebasing and pushing their rebased branches).
Running "git gc --auto
" in this state would cause agc.log
file to be created, preventing future auto gcs, causing pack files to pile up.
Since many git operations areO(n)
in the number of pack files, this would lead to poor performance.Git should never get itself into a state where it refuses to do any maintenance, just because at some point some piece of the maintenance didn't make progress.
Teach Git to ignore
gc.log
files which are older than (by default) one day old, which can be tweaked via thegc.logExpiry
configuration variable.
That way, these pack files will get cleaned up, if necessary, at least once per day. And operators who find a need for more-frequent gcs can adjustgc.logExpiry
to meet their needs.
Note: since Git 2.17 (Q2 2018), git gc --auto
will run on each git commit
too.
See "List of all commands that cause git gc --auto
".
And there is a pre-gc --auto
hook associated to that command too.