What are the differences between bsdtar and GNU tar?
The Ubuntu bsdtar
is actually the tar implementation bundled with libarchive
; and that should be differentiated from classical bsdtar
. Some BSD variants do use libarchive
for their tar implementation, eg FreeBSD.
GNUtar
does support the other tar variants and automatic compression detection.
As visualication pasted the blurb from Ubuntu, there are a few things in there that are specific to libarchive
:
libarchive
is by definition a library, and different from both classicalbsdtar
andGNUtar
in that way.libarchive
cannot read some older obscure GNU tar variations, most notable was encoding of some headers in base64, so that the tar file would be 7-bit clean ASCII (this was the case for 1.13.6-1.13.11 and changed in 1.13.12, that code was only officially in tar for 2 weeks)libarchive
'sbsdtar
will read non-tar files (eg zip, iso9660, cpio), but classical bsdtar will not.
Now that we've gotten libarchive
out of the way, it mostly comes down to what is supported in classical bsdtar
.
You can see the manpages yourself here:
- GNU tar(1)
- FreeBSD tar(1) - libarchive-based
- NetBSD tar(1)
- OpenBSD tar(1)
- Standard/Schily tar(1) - the oldest free tar implementation, no heritage to any other
- busybox (1) - Mini tar implementation for BusyBox, common in embedded systems
In your original question, you asked what are the advantages to the classical bsdtar
, and I'm not sure there are really any. The only time it really matters is if you're trying to writing shell scripts that need to work on all systems; you need to make sure what you pass to tar
is actually valid in all variants.
GNUtar
, libarchive
's bsdtar
, classical bsdtar
, star
and BusyBox
's tar
are certainly the tar implementations that you'll run into most of the time, but I'm certain there are others out there (early QNX for example). libarchive
/GNUtar
/star
are the most feature-packed, but in many ways they have long deviated from the original standards (possibly for the better).
BSDTAR vs TAR plus much more
Here is one benefit!!
I'm going to go into 5 topics here (and go way off topic, but it will cover what you want as well):
- bsdtar vs tar
- sparse files vs not
- thick and thin files/luns with btrfs
- thick and thin files/luns without btrfs
- diff between thick and thin and how it doesn't apply to just luns
bsdtar handles sparse files better then regular tar
- bsdtar will take all of the zeros and just metadata them up
- tar will actually processes every zero
*example: imagine a 20 tb sparse file (called biglun) with 10 megs of data throughout the 20 tb sparsefile (biglun)... now since this is a sparse file it will only take up 10 megs on the drive.
How to make a sparse file:
Sparse File - how to make it - detect it - everything Sparse files are like "thin" luns (if you were to use it for a lun). "thick" luns would be different story.
*back to topic:
taring up the biglun will make tar go through all of 10 megs along with all of the ~20tb worse of zeroes spread across the lun... it will take some time I presume, and the tar file will be pretty big. Also -- extracting it -- I've never done an extract of a tar file of a sparse file, but it might not be pretty; I might be wrong here.
bsdtarring the biglun will just process the 10 megs of data, and make small metadata for the ~20tb of zeros.
Benefit? Well lots of them; I just wrote some above.
It's similar to rsync vs cp
- Also, if you rsync a giant sparse file, it will behave like tar
- If you cp a giant file, it will behave automatically like bsdtar (you can change cp'ss behaviour to go over the zeroes, or not go over the zeroes)
Personally, I like to imagine sparse files like thin luns, and regular files like thick luns...
Next topic is BTRFS thin vs thick luns:
With filesystems like BTRFS, thin luns are sparse files (make it with truncate, like in the wiki doc).
truncate -s <size in kilobytes> filename
tip: backup with bsdtar, copy with cp
thick luns are regular files with the +C attribute (+C so that it makes it none COW, copy on write, so that all writes essentially stick around to where it's allocated to, and no new writes happen for that file when there are overwrites or deletes - research COW and BTRFS). Instead of making the file with truncate, make it with "fallocate -l "
fallocate -l <size in kilobytes> filename chattr +C filename
tip: backup with bsdtar or tar, copy with rsync or cp
next topic is EXT thin vs thick luns:
thin luns which are sparse
truncate -s <size in kilobytes> filename
tip: backup with bsdtar, copy with cp
thick luns are regular files with the +C attribute (+C so that it makes it none COW, copy on write, so that all writes essentially stick around to where its allocated to, and no new writes happen for that file when there are overwrites or deletes - research COW and BTRFS). Instead of making the file with truncate, make it with "fallocate -l "
touch filename fallocate -l <size in kilobytes> filename
tip: backup with bsdtar or tar, copy with rsync or cp
whats a thick vs thin file
- thick luns/files, fill up their data from 0 to the size allotted, metadata pretends where the 0s are. as you fill up data, the data fills up
- thick luns/files: fill up their data at the start with 0s or whatever (lazy zero or eager zero) - these set reservations (or as ZFS like to call refreservations)
VMWARE ARTICLE HERE describes lazy vs eager zero with thick luns/files: https://communities.vmware.com/message/2199576
tip
remember thick and thin doesn't just apply to luns, it can also be on files, zfs filesystems (shares/volumes/luns), and I'm sure other things (just look at zfs).
From the Ubuntu package description:
The bsdtar program has a number of advantages over previous tar implementations:
- Library. Since the core functionality is in a library, it can be used by other tools, such as pkg_add.
- Automatic format detection. Libarchive automatically detects the compression (none/gzip/bzip2) and format (old tar, ustar, gnutar, pax, cpio, iso9660, zip) when reading archives. It does this for any data source.
- Pax Interchange Format Support. This is a POSIX/SUSv3 extension to the old "ustar" tar format that adds arbitrary extended attributes to each entry. Does everything that GNU tar format does, only better.
- Handles file flags, ACLs, arbitrary pathnames, etc. Pax interchange format supports key/value attributes using an easily-extensible technique. Arbitrary pathnames, group names, user names, file sizes are part of the POSIX standard; libarchive extends this with support for file flags, ACLs, and arbitrary device numbers.
- GNU tar support. Libarchive reads most GNU tar archives. If there is demand, this can be improved further.