What does --strip-components -C mean in tar?
The fragment of manpage you included in your question comes from man
for GNU tar. GNU is a software project that prefers info manuals
over manpages. In fact, tar manpage has been added to the GNU tar
source code tree only in
2014
and it still is just a reference, not a full-blown manual with
examples. You can invoke a full info manual with info tar
, it's
also available online
here. It contains
several examples of --strip-components
usage, the relevant fragments
are:
--strip-components=number
Strip given number of leading components from file names before extraction.
For example, if archive `archive.tar' contained `some/file/name', then running
tar --extract --file archive.tar --strip-components=2
would extract this file to file `name'.
and:
--strip-components=number
Strip given number of leading components from file names before extraction.
For example, suppose you have archived whole `/usr' hierarchy to a tar archive named `usr.tar'. Among other files, this archive contains `usr/include/stdlib.h', which you wish to extract to the current working directory. To do so, you type:
$ tar -xf usr.tar --strip=2 usr/include/stdlib.h
The option `--strip=2' instructs tar to strip the two leading components (`usr/' and `include/') off the file name.
That said;
There are other implementations of tar out there, for example FreeBSD tar manpage has a different explanation of this command:
--strip-components count
Remove the specified number of leading path elements. Pathnames with fewer elements will be silently skipped. Note that the pathname is edited after checking inclusion/exclusion patterns but before security checks.
In other words, you should understand a Unix path as a sequence of
elements separated by /
(unless there is only one /
).
Here is my own example (other examples are available in the info manual I linked to above):
Let's create a new directory structure:
mkdir -p a/b/c
Path a/b/c
is composed of 3 elements: a
, b
, and c
.
Create an empty file in this directory and put it into .tar archive:
$ touch a/b/c/FILE
$ tar -cf archive.tar a/b/c/FILE
FILE
is a 4th element of a/b/c/FILE
path.
List contents of archive.tar:
$ tar tf archive.tar
a/b/c/FILE
You can now extract archive.tar with --strip-components
and an
argument that will tell it how many path elements you want to be removed from the a/b/c/FILE
when extracted. Remove an original a
directory:
rm -r a
Extract with --strip-components=1
- only a
has not been recreated:
$ tar xf archive.tar --strip-components=1
$ ls -Al
total 16
-rw-r--r-- 1 ja users 10240 Mar 26 15:41 archive.tar
drwxr-xr-x 3 ja users 4096 Mar 26 15:43 b
$ tree b
b
└── c
└── FILE
1 directory, 1 file
With --strip-components=2
you see that a/b
- 2 elements have not
been recreated:
$ rm -r b
$ tar xf archive.tar --strip-components=2
$ ls -Al
total 16
-rw-r--r-- 1 ja users 10240 Mar 26 15:41 archive.tar
drwxr-xr-x 2 ja users 4096 Mar 26 15:46 c
$ tree c
c
└── FILE
0 directories, 1 file
With --strip-components=3
3 elements a/b/c
have not been recreated
and we got FILE
in the same level directory in which we run tar
:
$ rm -r c
$ tar xf archive.tar --strip-components=3
$ ls -Al
total 12
-rw-r--r-- 1 ja users 0 Mar 26 15:39 FILE
-rw-r--r-- 1 ja users 10240 Mar 26 15:41 archive.tar
-C
option tells tar to change to a given directory before running a
requested operation, extracting but also archiving. In this
comment
you asked:
Asking tar to do cd: why cd? I mean to ask, why it's not just mv?
Why do you think that mv
is better? To what directory would you like
to extract tar archive first:
/tmp - what if it's missing or full?
"$TMPDIR" - what if it's unset, missing or full?
current directory - what if user has no
w
permission, justr
andx
?what if a temporary directory, whatever it is already contained files with the same names as in
tar
archive and extracting would overwrite them?what if a temporary directory, whatever it is didn't support Unix filesystems and all info about ownership, executable bits etc. would be lost?
Also notice that -C
is a common change directory
option in other
programs as well, Git and
make are first that come to my
mind.
The --strip-components
option is for modifying the filenames of extracted files. --strip-components <N>
means "remove the first <N>
components of the filename", where "components" is referring to parts of the path separated by /
.
If you have a filename foo/bar/baz
, then with --strip-components 1
the extracted file would be named bar/baz
.
The -C
option just means "change directory". If you use -C /some/other/place
you are effectively asking tar
to cd /some/other/place
before extracting files. This generally means that files would be extracted relative to /some/other/place
.
To supplement the good answer by larsks, to try to help clear up the confusion around the -C
option:
The tar
man page states for the -C
option:
-C, --directory=DIR
Change to DIR before performing any operations. This option is
order-sensitive, i.e. it affects all options that follow.
so, it is not like mv
- it is literally telling tar
to change to a different working directory. Also note the point that the order matters. So:
tar r -f archive.tar -C /path file1 file2
would append files file1
and file2
at the location /path
to archive.tar
, whilst
tar r -f archive.tar file1 file2 -C /path file3 file4
would append file1
and file2
from the current working directory and then file3
and file4
from the /path
location.
--strip-components
and -C
are independent and unrelated options to tar
. The first affects the folder structure of the archived files, once they are extracted, and -C
specifies the working directory that tar
is using to specify files that are external to the archive.