How to determine if Git handles a file as binary or as text?
git grep -I --name-only --untracked -e . -- ascii.dat binary.dat ...
will return the names of files that git interprets as text files.
The trick here is in these two git grep parameters:
-I
: Don’t match the pattern in binary files.-e .
: Regular expression match any character in the file
You can use wildcards e.g.
git grep -I --name-only --untracked -e . -- *.ps1
# considered binary (or with bare CR) file
git ls-files --eol | grep -E '^(i/-text)'
# files that do not have any line-ending characters (including empty files) - unlikely that this is a true binary file ?
git ls-files --eol | grep -E '^(i/none)'
# via experimentation
# ------------------------
# "-text" binary (or with bare CR) file : not auto-normalized
# "none" text file without any EOL : not auto-normalized
# "lf" text file with LF : is auto-normalized when gitattributes text=auto
# "crlf" text file with CRLF : is auto-normalized when gitattributes text=auto
# "mixed" text file with mixed line endings : is auto-normalized when gitattributes text=auto
# (LF or CRLF, but not bare CR)
Source: https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt---eol https://github.com/git/git/commit/a7630bd4274a0dff7cff8b92de3d3f064e321359
Oh by the way: be careful with setting the .gitattributes
text attribute e.g. *.abc text
. Because in that case all files with *.abc
will be normalized, even if they are binary (internal CRLF found in the binary would be normalized to LF). This is different from the auto behaviour.
builtin_diff()
1 calls diff_filespec_is_binary()
which calls buffer_is_binary()
which checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes (or the entire length if shorter).
I do not see that this “is it binary?” test is explicitly exposed in any command though.
git merge-file
directly uses buffer_is_binary()
, so you may be able to make use of it:
git merge-file /dev/null /dev/null file-to-test
It seems to produce the error message like error: Cannot merge binary files: file-to-test
and yields an exit status of 255 when given a binary file. I am not sure I would want to rely on this behavior though.
Maybe git diff --numstat
would be more reliable:
isBinary() {
p=$(printf '%s\t-\t' -)
t=$(git diff --no-index --numstat /dev/null "$1")
case "$t" in "$p"*) return 0 ;; esac
return 1
}
isBinary file-to-test && echo binary || echo not binary
For binary files, the --numstat
output should start with -
TAB -
TAB, so we just test for that.
1builtin_diff()
has strings like Binary files %s and %s differ
that should be familiar.
I don't like this answer, but you can parse the output of git-diff-tree to see if it is binary. For example:
git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- MegaCli
diff --git a/megaraid/MegaCli b/megaraid/MegaCli
new file mode 100755
index 0000000..7f0e997
Binary files /dev/null and b/megaraid/MegaCli differ
as opposed to:
git diff-tree -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- megamgr
diff --git a/megaraid/megamgr b/megaraid/megamgr
new file mode 100755
index 0000000..50fd8a1
--- /dev/null
+++ b/megaraid/megamgr
@@ -0,0 +1,78 @@
+#!/bin/sh
[…]
Oh, and BTW, 4b825d… is a magic SHA which represents the empty tree (it is the SHA for an empty tree, but git is specially aware of this magic).