How to use git sparse-checkout in 2.27+
I believe I found the reason for this. Commit f56f31af0301
to Git changed the implementation of sparse-checkout
so that, when you have an uninitialized working tree (as you would right after running git clone --no-checkout
), running git sparse-checkout init
will not check out any files into your working tree. In previous versions, the command would actually check out files, which could have unexpected effects given that you wouldn't have an active branch at that point.
The relevant commit, f56f31af0301
was included in Git 2.27, but not in 2.25. That accounts for why the behavior you see is not the behavior shown on the web page you're trying to follow. Basically, the behavior on the web page was a bug that nobody realized was a bug at the time, but with Git 2.27, it has been fixed.
This is explained very well, I think, in the message for commit b5bfc08a972d
:
So...that brings us to the special case: a git clone performed with
--no-checkout
. As per the meaning of the flag,--no-checkout
does not check out any branch, with the implication that you aren't on one and need to switch to one after the clone. Implementationally,HEAD
is still set (so in some sense you are partially on a branch), but
- the index is "unborn" (non-existent)
- there are no files in the working tree (other than
.git/
)- the next time
git switch
(orgit checkout
) is run it will run unpack_trees withinitial_checkout
flag set to true.It is not until you run, e.g.
git switch <somebranch>
that the index will be written and files in the working tree populated.With this special
--no-checkout
case, the traditionalread-tree -mu HEAD
behavior would have done the equivalent of acting likecheckout
-- switch to the default branch (HEAD
), write out an index that matchesHEAD
, and update the working tree to match. This special case slipped through the avoid-making-changes checks in the originalsparse-checkout
command and thus continued there.After
update_sparsity()
was introduced and used (see commitf56f31a
("sparse-checkout: use new update_sparsity() function
", 2020-03-27)), the behavior for the--no-checkout
case changed: Due to git's auto-vivification of an empty in-memory index (seedo_read_index()
and note thatmust_exist
is false), and due tosparse-checkout
'supdate_working_directory()
code to always write out the index after it was done, we got a new bug. That made it so thatsparse-checkout
would switch the repository from a clone with an "unborn" index (i.e. still needing aninitial_checkout
), to one that had a recorded index with no entries. Thus, instead of all the files appearing deleted ingit status
being known to git as a special artifact of not yet being on a branch, our recording of an empty index made it suddenly look to git as though it was definitely on a branch with ALL files staged for deletion! A subsequent checkout or switch then had to contend with the fact that it wasn't on aninitial_checkout
but had a bunch of staged deletions.
With Git 2.35 (Q1 2022), the "init
" and "set
" subcommands in "git sparse-checkout
"(man) have been unified for a better user experience and performance.
See commit dfac9b6 (23 Dec 2021), and commit d359541, commit d30e2bb, commit ba2f3f5, commit 4e25673, commit f2e3a21, commit be61fd1, commit f85751a, commit 45c5e47, commit 0b624e0, commit 1530ff3 (14 Dec 2021) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 2dc94da, 03 Jan 2022)
sparse-checkout
: enableset
to initialize sparse-checkout modeReviewed-by: Derrick Stolee
Reviewed-by: Victoria Dye
Signed-off-by: Elijah Newren
The previously suggested workflow: git sparse-checkout init ... git sparse-checkout set ...
Suffered from three problems:
- It would delete nearly all files in the first step, then restore them in the second.
That was poor performance and forced unnecessary rebuilds.- The two-step process resulted in two progress bars, which was suboptimal from a UI point of view for wrappers that invoked both of these commands but only exposed a single command to their end users.
- With cone mode, the first step would delete nearly all ignored files everywhere, because everything was considered to be outside of the specified sparsity paths.
(The user was not allowed to specify any sparsity paths in theinit
step.)Avoid these problems by teaching
set
to understand the extra parameters thatinit
takes and performing any necessary initialization if not already in a sparse checkout.
I did mentioned before in "Why do excluded files keep reappearing in my git sparse checkout?" how any skip-worktree file should not be modified or even looked at during a sparse checkout anymore with Git 2.27+.
But with the new sparceIndex
option with Git 2.32 (Q2 2021), that changes again:
With Git 2.32 (Q2 2021) adds sparse-index.
See "Make your monorepo feel small with Git’s sparse index" from Derrick Stolee.
See commit 4589bca, commit 71f82d0, commit 5f11669 (12 Apr 2021), commit f5fed74, commit dc26b23, commit 0c18c05, commit 465a04a, commit f7ef64b, commit 3450a30, commit d425f65, commit 2508df0, commit a029120, commit e43e2a1, commit 299e2c4, commit 42f44e8, commit 46eb6e3, commit 2227ea1, commit 48b3c7d, commit cb8388d, commit 0f6d3ba, commit 1b850d3, commit 54beed2, commit 118a2e8, commit 95e0321, commit 847a9e5, commit 839a663 (01 Apr 2021), and commit c9e40ae, commit 9ad2d5e, commit 2de37c5, commit dcc5fd5, commit 122ba1f, commit 58300f4, commit 0938e6f, commit 13e1331, commit f442313, commit 6e77352, commit cd42415, commit 836e25c, commit 6863df3, commit 2782db3, commit e2df6c3, commit ecfc47c, commit 4300f84, commit 3964fc2, commit 4b3f765, commit 0b5fcb0, commit 0ad6090 (30 Mar 2021) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 8e97852, 30 Apr 2021)
sparse-index
: design doc and format updateSigned-off-by: Derrick Stolee
This begins a long effort to update the index format to allow sparse directory entries.
This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in theirsparse-checkout
definition.Currently, the index format is only updated in the presence of
extensions.sparseIndex
instead of increasing a file format version number.
This is temporary, and index v5 is part of the plan for future work in this area.The design document details many of the reasons for embarking on this work, and also the plan for completing it safely.
technical/index-format
now includes in its man page:
An index entry typically represents a file. However, if sparse-checkout is enabled in cone mode (
core.sparseCheckoutCone
is enabled) and theextensions.sparseIndex
extension is enabled, then the index may contain entries for directories outside of the sparse-checkout definition. These entries have mode040000
, include theSKIP_WORKTREE
bit, and the path ends in a directory separator.
technical/sparse-index
now includes in its man page:
Git Sparse-Index Design Document
The sparse-checkout feature allows users to focus a working directory on a subset of the files at HEAD. The cone mode patterns, enabled by
core.sparseCheckoutCone
, allow for very fast pattern matching to discover which files at HEAD belong in the sparse-checkout cone.Three important scale dimensions for a Git working directory are:
HEAD
: How many files are present atHEAD
?Populated: How many files are within the sparse-checkout cone.
Modified: How many files has the user modified in the working directory?
We will use big-O notation --
O(X)
-- to denote how expensive certain operations are in terms of these dimensions.These dimensions are ordered by their magnitude: users (typically) modify fewer files than are populated, and we can only populate files at
HEAD
.Problems occur if there is an extreme imbalance in these dimensions. For example, if
HEAD
contains millions of paths but the populated set has only tens of thousands, then commands likegit status
andgit add
can be dominated by operations that require O(HEAD
) operations instead of O(Populated). Primarily, the cost is in parsing and rewriting the index, which is filled primarily with files atHEAD
that are marked with theSKIP_WORKTREE
bit.The sparse-index intends to take these commands that read and modify the index from O(
HEAD
) to O(Populated).To do this, we need to modify the index format in a significant way: add "
sparse directory
" entries.With cone mode patterns, it is possible to detect when an entire directory will have its contents outside of the sparse-checkout definition. Instead of listing all of the files it contains as individual entries, a sparse-index contains an entry with the directory name, referencing the object ID of the tree at
HEAD
and marked with theSKIP_WORKTREE
bit. If we need to discover the details for paths within that directory, we can parse trees to find that list.
So you have a new option to git sparse-checkout init
: --[no-]sparse-index
sparse-checkout
: toggle sparse index from builtinSigned-off-by: Derrick Stolee
The sparse index extension is used to signal that index writes should be in sparse mode.
This was only updated usingGIT_TEST_SPARSE_INDEX=1
.Add a '--[no-]sparse-index' option to '
git sparse-checkout init
'(man) that specifies if the sparse index should be used.
It also updates the index to use the correct format, either way.
Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools.
'git sparse-checkout init
already setsextension.worktreeConfig
, which places most sparse-checkout users outside of the scope of most third-party tools.
git sparse-checkout
now includes in its man page:
Use the
--[no-]sparse-index
option to toggle the use of the sparse index format.This reduces the size of the index to be more closely aligned with your sparse-checkout definition.
This can have significant performance advantages for commands such as
git status
orgit add
. This feature is still experimental. Some commands might be slower with a sparse index until they are properly integrated with the feature.WARNING: Using a sparse index requires modifying the index in a way that is not completely understood by external tools. If you have trouble with this compatibility, then run
git sparse-checkout init --no-sparse-index
to rewrite your index to not be sparse.Older versions of Git will not understand the sparse directory entries index extension and may fail to interact with your repository until it is disabled.
With Git 2.33 (Q3 2021), "git status
"(man) codepath learned to work with sparsely populated index without hydrating it fully.
See commit e5ca291, commit f8fe49e, commit fe0d576, commit d76723e, commit bf48e5a, commit 9eb00af, commit 69bdbdb, commit 523506d, commit bd6a3fd, commit cd807a5, commit 17a1bb5, commit bf26c06, commit e669ffb, commit 3d814b5, commit 4741077, commit fc6609d (14 Jul 2021) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit b271a30, 28 Jul 2021)
status
: skip sparse-checkout percentage with sparse-indexReviewed-by: Elijah Newren
Signed-off-by: Derrick Stolee
'
git status
'(man) began reporting a percentage of populated paths when sparse-checkout is enabled in 051df3c ("wt-status
: show sparse checkout status as well", 2020-07-18, Git v2.28.0-rc0 -- merge listed in batch #7).
This percentage is incorrect when the index has sparse directories.
It would also be expensive to calculate as we would need to parse trees to count the total number of possible paths.Avoid the expensive computation by simplifying the output to only report that a sparse checkout exists, without the percentage.
This change is the reason we use '
git status
' --porcelain=v2 in t1092-sparse-checkout-compatibility.sh.
We don't want to ensure that this message is equal across both modes, but instead just the important information about staged, modified, and untracked files are compared.
Warning: Recent sparse-index
work broke safety against attempts to add paths with trailing slashes to the index, which has been corrected with Git 2.34 (Q4 2021).
See commit c8ad9d0, commit 2a1ae64, commit fc5e90b (07 Oct 2021) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit a86ed75, 18 Oct 2021)
read-cache
: letverify_path()
reject trailing dir separators againSigned-off-by: René Scharfe
6e77352 ("
sparse-index
: convert from full to sparse", 2021-03-30, Git v2.32.0-rc0 -- merge listed in batch #13) madeverify_path()
accept trailing directory separators for directories, which is necessary for sparse directory entries.
This clemency causes "git stash
"(man) to stumble over sub-repositories, though, and there may be more unintended side-effects.Avoid them by restoring the old
verify_path()
behavior and accepting trailing directory separators only in places that are supposed to handle sparse directory entries.
With Git 2.35 (Q1 2022), ensure that the sparseness of the in-core index matches the index.sparse configuration specified by the repository immediately after the on-disk index file is read.
See commit 7ca4fc8, commit b93fea0, commit 13f69f3, commit 336d82e (23 Nov 2021) by Victoria Dye (vdye
).
(Merged by Junio C Hamano -- gitster
-- in commit 5396d7b, 10 Dec 2021)
sparse-index
: updatedo_read_index
to ensure correct sparsityHelped-by: Junio C Hamano
Co-authored-by: Derrick Stolee
Signed-off-by: Victoria Dye
Reviewed-by: Elijah Newren
Unless
command_requires_full_index
forces index expansion, ensure in-core index sparsity matches config settings on read by callingensure_correct_sparsity
.
This makes the behavior of the in-core index more consistent between different methods of updating sparsity: manually changing theindex.sparse
config setting vs.
executinggit sparse-checkout --[no-]sparse-index init
(man)Although index sparsity is normally updated with
git sparse-checkout
init, ensuring correct sparsity after a manualindex.sparse
change has some practical benefits:
- It allows for command-by-command sparsity toggling with
-c index.sparse=<true|false>
, e.g. when troubleshooting issues with the sparse index.- It prevents users from experiencing abnormal slowness after setting
index.sparse
totrue
due to use of a full index in all commands until the on-disk index is updated.
Warning: before Git 2.35 (Q1 2022), the sparse-index/sparse-checkout feature had a bug in its use of the matching code to determine which path is in or outside the sparse checkout patterns.
See commit 8c5de0d, commit 1b38efc (06 Dec 2021) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit e1d9288, 15 Dec 2021)
unpack-trees
: usetraverse_path
instead of nameReported-by: Gustave Granroth
Reported-by: Mike Marcelais
Signed-off-by: Derrick Stolee
The
sparse_dir_matches_path()
method compares a cache entry that is a sparse directory entry against a 'structtraverse_info
*info' and a 'structname_entry
*p' to see if the cache entry has exactly the right name for those other inputs.This method was introduced in 523506d ("
unpack-trees
: unpack sparse directory entries", 2021-07-14, Git v2.33.0-rc0 -- merge listed in batch #7), but included a significant mistake.
The path comparisons used'info->name'
instead of'info->traverse_path'
.
Since'info->name'
only stores a single tree entry name while'info->traverse_path'
stores the full path from root, this method does not work when 'info' is in a subdirectory of a directory.
Replacing the right strings and their corresponding lengths make the method work properly.The previous change included a failing test that exposes this issue.
That test now passes.
The critical detail is that as we go deep intounpack_trees()
, the logic for merging a sparse directory entry with a tree entry during 'git checkout
'(man) relies on thissparse_dir_matches_path()
in order to avoid callingtraverse_trees_recursive()
duringunpack_callback()
in this hunk:if (!is_sparse_directory_entry(src[0], names, info) && traverse_trees_recursive(n, dirmask, mask & ~dirmask, names, info) < 0) { return -1; }
For deep paths, the short-circuit never occurred and
traverse_trees_recursive()
was being called incorrectly and that was causing other strange issues.
Specifically, the error message from the now-passing test previously included this:error: Your local changes to the following files would be overwritten by checkout: deep/deeper1/deepest2/a deep/deeper1/deepest3/a Please commit your changes or stash them before you switch branches. Aborting
These messages occurred because the 'current' cache entry in
twoway_merge()
was showing asNULL
because the index did not contain entries for the paths contained within the sparse directory entries.
We instead had 'oldtree' given as the entry at HEAD and 'newtree' as the entry in the target tree.
This led toreject_merge()
listing these paths.
With Git 2.35 (Q1 2022), teach diff and blame to work well with sparse index.
See commit add4c86, commit 51ba65b, commit 338e2a9, commit 44c7e62, commit 27a443b, commit 0803f9c, commit e5b17bd (06 Dec 2021) by Lessley Dennington (ldennington
).
See commit ea6ae41 (29 Nov 2021) by Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
-- in commit 8d2c373, 21 Dec 2021)
blame
: enable and test the sparse indexSigned-off-by: Lessley Dennington
Reviewed-by: Elijah Newren
Enable the sparse index for the '
git blame
'(man) command.
The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame
' behaves correctly when the sparse index is enabled and that its performance improves.
More specifically, these cases are:
The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels.
Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels.
We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory.
This is true in both sparse and full checkouts.
And:
diff
: enable and test the sparse indexCo-authored-by: Derrick Stolee
Signed-off-by: Lessley Dennington
Reviewed-by: Elijah Newren
Enable the sparse index within the '
git diff
'(man) command.
Its implementation already safely integrates with the sparse index because it shares code with the 'git status
'(man) and 'git checkout
'(man) commands that were already integrated.
For more details see:d76723e ("
status
: use sparse-index throughout", 2021-07-14, Git v2.33.0-rc0 -- merge listed in batch #7) 1ba5f45 ("checkout
: stop expanding sparse indexes", 2021-06-29, Git v2.33.0-rc1 -- merge)The most interesting thing to do is to add tests that verify that '
git diff
' behaves correctly when the sparse index is enabled.
These cases are:
- The index is not expanded for 'diff' and 'diff --staged' 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse checkout, and sparse index repositories in the following partially-staged scenarios (i.e.
the index, HEAD, and working directory differ at a given path):- Path is within sparse-checkout cone
- Path is outside sparse-checkout cone
- A merge conflict exists for paths outside sparse-checkout cone
Here is a solution that will populate only files in the root folder:
$ git clone --filter=blob:none --sparse https://github.com/derrickstolee/sparse-checkout-example
Then subsequent sparse-checkout calls work like a charm.
Still no idea why the tutorial is broken.