How to tar files with a sorted order?

For a GNU tar:

--sort=ORDER
 Specify the directory sorting order when reading directories.
 ORDER may be one of the following:

`none'
      No directory sorting is performed. This is the default.

`name'
      Sort the directory entries on name. The operating system may
      deliver directory entries in a more or less random order, and
      sorting them makes archive creation reproducible.

`inode'
      Sort the directory entries on inode number. Sorting
      directories on inode number may reduce the amount of disk
      seek operations when creating an archive for some file
      systems.

You'll probably also want to look at --preserve-order.


With zsh, instead of:

pax -w dir

Use:

pax -dw dir dir/**/*(D)

You can do the same with recent versions of bash -O globstar -O dotglob with:

pax -dw dir/**

Or recent versions of FIGNORE='@(.|..)' ksh93 -o globstar with:

pax -dw dir dir/**

pax is the standard command to make tar files. The output goes to stdout. Shell globs are sorted by name.

If you run into an Arg list too long error, you can change to:

printf '%s\0' dir dir/**/*(D) | pax -0dw

(not all pax implementations support -0 though).


tar itself cannot do this, so you have to create it from a correctly ordered list. In principle you could then use tar's -T option, but there is no way to specify that the filenames in that list should be NUL terminated. So if you have any filenames with newlines in them (which is allowed) this will just break.

A better option is to use cpio to generate the files as that accepts a NUL terminated list of filenames and can generate tar files.

If your tar command would be:

tar cvf /somedir/all.tar .

Then for this to be sorted by name you would have to do (assuming GNU find and cpio):

find . -type f -print0 | sort -z | \
  cpio --create --null --format=ustar -O /somedir/all.tar

This has the disadvantage though that subdirectories are placed in between filenames. You can do tricks with finds -printf0 specifying the directory and depth information and sorting with sort -n but that also influences how files with numbers are sorted within a directory.

If the above is not satisfactory you could probably use a small python program based on os.walk() to generate the ordering you want with full control (depth first, based on extenion etc), but if you go that route you might as well drop cpio and write out the tar file with python's tarfile module.

Tags:

Linux

Tar

Sort