Limiting file size in git repository
if you are using gitolite you can also try VREF. There is one VREF already provided by default (the code is in gitolite/src/VREF/MAX_NEWBIN_SIZE). It is called MAX_NEWBIN_SIZE. It works like this:
repo name
RW+ = username
- VREF/MAX_NEWBIN_SIZE/1000 = usernames
Where 1000 is example threshold in Bytes.
This VREF works like a update hook and it will reject your push if one file you are to push is greater than the threshold.
As I was struggling with it for a while, even with the description, and I think this is relevant for others too, I thought I'd post an implementation of how what J16 SDiZ described could be implemented.
So, my take on the server-side update
hook preventing too big files to be pushed:
#!/bin/bash
# Script to limit the size of a push to git repository.
# Git repo has issues with big pushes, and we shouldn't have a real need for those
#
# eis/02.02.2012
# --- Safety check, should not be run from command line
if [ -z "$GIT_DIR" ]; then
echo "Don't run this script from the command line." >&2
echo " (if you want, you could supply GIT_DIR then run" >&2
echo " $0 <ref> <oldrev> <newrev>)" >&2
exit 1
fi
# Test that tab replacement works, issue in some Solaris envs at least
testvariable=`echo -e "\t" | sed 's/\s//'`
if [ "$testvariable" != "" ]; then
echo "Environment check failed - please contact git hosting." >&2
exit 1
fi
# File size limit is meant to be configured through 'hooks.filesizelimit' setting
filesizelimit=$(git config hooks.filesizelimit)
# If we haven't configured a file size limit, use default value of about 100M
if [ -z "$filesizelimit" ]; then
filesizelimit=100000000
fi
# Reference to incoming checkin can be found at $3
refname=$3
# With this command, we can find information about the file coming in that has biggest size
# We also normalize the line for excess whitespace
biggest_checkin_normalized=$(git ls-tree --full-tree -r -l $refname | sort -k 4 -n -r | head -1 | sed 's/^ *//;s/ *$//;s/\s\{1,\}/ /g' )
# Based on that, we can find what we are interested about
filesize=`echo $biggest_checkin_normalized | cut -d ' ' -f4,4`
# Actual comparison
# To cancel a push, we exit with status code 1
# It is also a good idea to print out some info about the cause of rejection
if [ $filesize -gt $filesizelimit ]; then
# To be more user-friendly, we also look up the name of the offending file
filename=`echo $biggest_checkin_normalized | cut -d ' ' -f5,5`
echo "Error: Too large push attempted." >&2
echo >&2
echo "File size limit is $filesizelimit, and you tried to push file named $filename of size $filesize." >&2
echo "Contact configuration team if you really need to do this." >&2
exit 1
fi
exit 0
Note that it's been commented that this code only checks the latest commit, so this code would need to be tweaked to iterate commits between $2 and $3 and do the check to all of them.
This one is pretty good:
#!/bin/bash -u
#
# git-max-filesize
#
# git pre-receive hook to reject large files that should be commited
# via git-lfs (large file support) instead.
#
# Author: Christoph Hack <[email protected]>
# Copyright (c) 2017 mgIT GmbH. All rights reserved.
# Distributed under the Apache License. See LICENSE for details.
#
set -o pipefail
readonly DEFAULT_MAXSIZE="5242880" # 5MB
readonly CONFIG_NAME="hooks.maxfilesize"
readonly NULLSHA="0000000000000000000000000000000000000000"
readonly EXIT_SUCCESS="0"
readonly EXIT_FAILURE="1"
# main entry point
function main() {
local status="$EXIT_SUCCESS"
# get maximum filesize (from repository-specific config)
local maxsize
maxsize="$(get_maxsize)"
if [[ "$?" != 0 ]]; then
echo "failed to get ${CONFIG_NAME} from config"
exit "$EXIT_FAILURE"
fi
# skip this hook entirely if maxsize is 0.
if [[ "$maxsize" == 0 ]]; then
cat > /dev/null
exit "$EXIT_SUCCESS"
fi
# read lines from stdin (format: "<oldref> <newref> <refname>\n")
local oldref
local newref
local refname
while read oldref newref refname; do
# skip branch deletions
if [[ "$newref" == "$NULLSHA" ]]; then
continue
fi
# find large objects
# check all objects from $oldref (possible $NULLSHA) to $newref, but
# skip all objects that have already been accepted (i.e. are referenced by
# another branch or tag).
local target
if [[ "$oldref" == "$NULLSHA" ]]; then
target="$newref"
else
target="${oldref}..${newref}"
fi
local large_files
large_files="$(git rev-list --objects "$target" --not --branches=\* --tags=\* | \
git cat-file $'--batch-check=%(objectname)\t%(objecttype)\t%(objectsize)\t%(rest)' | \
awk -F '\t' -v maxbytes="$maxsize" '$3 > maxbytes' | cut -f 4-)"
if [[ "$?" != 0 ]]; then
echo "failed to check for large files in ref ${refname}"
continue
fi
IFS=$'\n'
for file in $large_files; do
if [[ "$status" == 0 ]]; then
echo ""
echo "-------------------------------------------------------------------------"
echo "Your push was rejected because it contains files larger than $(numfmt --to=iec "$maxsize")."
echo "Please use https://git-lfs.github.com/ to store larger files."
echo "-------------------------------------------------------------------------"
echo ""
echo "Offending files:"
status="$EXIT_FAILURE"
fi
echo " - ${file} (ref: ${refname})"
done
unset IFS
done
exit "$status"
}
# get the maximum filesize configured for this repository or the default
# value if no specific option has been set. Suffixes like 5k, 5m, 5g, etc.
# can be used (see git config --int).
function get_maxsize() {
local value;
value="$(git config --int "$CONFIG_NAME")"
if [[ "$?" != 0 ]] || [[ -z "$value" ]]; then
echo "$DEFAULT_MAXSIZE"
return "$EXIT_SUCCESS"
fi
echo "$value"
return "$EXIT_SUCCESS"
}
main
You can configure the size in the serverside config
file by adding:
[hooks]
maxfilesize = 1048576 # 1 MiB
The answers by eis and J-16 SDiZ suffer from a severe problem. They are only checking the state of the finale commit $3 or $newrev. They need to also check what is being submitted in the other commits between $2 (or $oldrev) and $3 (or $newrev) in the udpate hook.
J-16 SDiZ is closer to the right answer.
The big flaw is that someone whose departmental server has this update hook installed to protect it will find out the hard way that:
After using git rm to remove the big file accidentally being checked in, then the current tree or last commit only will be fine, and it will pull in the entire chain of commits, including the big file that was deleted, creating a swollen unhappy fat history that nobody wants.
To solution is either to check each and every commit from $oldrev to $newrev, or to specify the entire range $oldrev..$newrev. Be darn sure you are not just checking $newrev alone, or this will fail with massive junk in your git history, pushed out to share with others, and then difficult or impossible to remove after that.