What does --strip-components -C mean in tar?

The fragment of manpage you included in your question comes from man for GNU tar. GNU is a software project that prefers info manuals over manpages. In fact, tar manpage has been added to the GNU tar source code tree only in 2014 and it still is just a reference, not a full-blown manual with examples. You can invoke a full info manual with info tar, it's also available online here. It contains several examples of --strip-components usage, the relevant fragments are:

--strip-components=number

Strip given number of leading components from file names before extraction.

For example, if archive `archive.tar' contained `some/file/name', then running

tar --extract --file archive.tar --strip-components=2

would extract this file to file `name'.

and:

--strip-components=number

Strip given number of leading components from file names before extraction.

For example, suppose you have archived whole `/usr' hierarchy to a tar archive named `usr.tar'. Among other files, this archive contains `usr/include/stdlib.h', which you wish to extract to the current working directory. To do so, you type:

$ tar -xf usr.tar --strip=2 usr/include/stdlib.h

The option `--strip=2' instructs tar to strip the two leading components (`usr/' and `include/') off the file name.

That said;

There are other implementations of tar out there, for example FreeBSD tar manpage has a different explanation of this command:

--strip-components count

Remove the specified number of leading path elements. Pathnames with fewer elements will be silently skipped. Note that the pathname is edited after checking inclusion/exclusion patterns but before security checks.

In other words, you should understand a Unix path as a sequence of elements separated by / (unless there is only one /).

Here is my own example (other examples are available in the info manual I linked to above):

Let's create a new directory structure:

mkdir -p a/b/c

Path a/b/c is composed of 3 elements: a, b, and c.

Create an empty file in this directory and put it into .tar archive:

$ touch a/b/c/FILE
$ tar -cf archive.tar a/b/c/FILE

FILE is a 4th element of a/b/c/FILE path.

List contents of archive.tar:

$ tar tf archive.tar
a/b/c/FILE

You can now extract archive.tar with --strip-components and an argument that will tell it how many path elements you want to be removed from the a/b/c/FILE when extracted. Remove an original a directory:

rm -r a

Extract with --strip-components=1 - only a has not been recreated:

$ tar xf archive.tar --strip-components=1
$ ls -Al
total 16
-rw-r--r-- 1 ja users 10240 Mar 26 15:41 archive.tar
drwxr-xr-x 3 ja users  4096 Mar 26 15:43 b
$ tree b
b
└── c
    └── FILE

1 directory, 1 file

With --strip-components=2 you see that a/b - 2 elements have not been recreated:

$ rm -r b
$ tar xf archive.tar --strip-components=2
$ ls -Al
total 16
-rw-r--r-- 1 ja users 10240 Mar 26 15:41 archive.tar
drwxr-xr-x 2 ja users  4096 Mar 26 15:46 c
$ tree c
c
└── FILE

0 directories, 1 file

With --strip-components=3 3 elements a/b/c have not been recreated and we got FILE in the same level directory in which we run tar:

$ rm -r c
$ tar xf archive.tar --strip-components=3
$ ls -Al
total 12
-rw-r--r-- 1 ja users     0 Mar 26 15:39 FILE
-rw-r--r-- 1 ja users 10240 Mar 26 15:41 archive.tar

-C option tells tar to change to a given directory before running a requested operation, extracting but also archiving. In this comment you asked:

Asking tar to do cd: why cd? I mean to ask, why it's not just mv?

Why do you think that mv is better? To what directory would you like to extract tar archive first:

/tmp - what if it's missing or full?
"$TMPDIR" - what if it's unset, missing or full?
current directory - what if user has no w permission, just r and x?
what if a temporary directory, whatever it is already contained files with the same names as in tar archive and extracting would overwrite them?
what if a temporary directory, whatever it is didn't support Unix filesystems and all info about ownership, executable bits etc. would be lost?

Also notice that -C is a common change directory option in other programs as well, Git and make are first that come to my mind.

The --strip-components option is for modifying the filenames of extracted files. --strip-components <N> means "remove the first <N> components of the filename", where "components" is referring to parts of the path separated by /.

If you have a filename foo/bar/baz, then with --strip-components 1 the extracted file would be named bar/baz.

The -C option just means "change directory". If you use -C /some/other/place you are effectively asking tar to cd /some/other/place before extracting files. This generally means that files would be extracted relative to /some/other/place.

To supplement the good answer by larsks, to try to help clear up the confusion around the -C option:

The tar man page states for the -C option:

-C, --directory=DIR
              Change  to DIR before performing any operations.  This option is
              order-sensitive, i.e. it affects all options that follow.

so, it is not like mv - it is literally telling tar to change to a different working directory. Also note the point that the order matters. So:

tar r -f archive.tar -C /path file1 file2

would append files file1 and file2 at the location /path to archive.tar, whilst

tar r -f archive.tar file1 file2 -C /path file3 file4

would append file1 and file2 from the current working directory and then file3 and file4 from the /path location.

--strip-components and -C are independent and unrelated options to tar. The first affects the folder structure of the archived files, once they are extracted, and -C specifies the working directory that tar is using to specify files that are external to the archive.

What does --strip-components -C mean in tar?

Tags:

Directory

Path

Tar

Related

Recent Posts