Concatenate multiple tar files in one command
This question is rather old but I wish it had been easier for myself to find the following information sooner. So if anyone else stumbles across this, enjoy:
What Jeff describes above is a known bug in gnu tar (reported in August 2008). Only the first archive (the one after the -f
option) gets its EOF marker removed. If you try to concatenate more than 2 archives the last archive(s) will be "hidden" behind file-end-markers.
It is a bug in tar. It concatenates entire archives, including trailing zero blocks, so by default reading the resulting archive stops after the first concatenation.
Source: https://lists.gnu.org/archive/html/bug-tar/2008-08/msg00002.html (and following messages)
Considering the age of the bug I wonder if it will ever get fixed. I doubt there is a critical mass that is affected.
The best way to circumvent this bug could be to use the -i
option, at least for .tar files on your file system.
As Jeff points out tar --concatenate
can take a long time to reach the EOF before it concatenates the next archive. So if you're going to be stuck with a "broken" archive that needs the tar -i
option to untar, I suggest the following:
Instead of using
tar --concatenate -f archive1.tar archive2.tar archive3.tar
you will likely be better off to run cat archive2.tar archive3.tar >> archive1.tar
or pipe to dd
if you intend to write to a tape device.
Also note that this could lead to unexpected behaviour if the tapes did not get zeroed before (over)writing new data onto them. For that reason the approach I am going to take in my application is nested tars as suggested in the comments below the question.
The above suggestion is based on the following very small sample benchmark:
time tar --concatenate -vf buffer.100025.tar buffer.100026.tar
real 65m33.524s
user 0m7.324s
sys 2m50.399s
time cat buffer.100027.tar >> buffer.100028.tar
real 46m34.101s
user 0m0.853s
sys 1m46.133s
The buffer.*.tar files are all 100GB in size, the system was pretty much idle except for each of the calls. The time difference is significant enough that I personally consider this benchmark valid despite small sample size, but you are free to your own judgement on this and probably best off to run a benchmark like this on your own hardware.
This may not help you, but if you are willing to use the -i
option when extracting from the final archive, then you can simply cat
the tars together.
A tar file ends with a header full of nulls and more null padding till the end of the record. With --concatenate
tar must go through all the headers to find the exact position of the final header, in order to start overwriting there.
If you just cat
the tars, you just have extra nulls between headers. The -i
option asks tar to ignore these nulls between headers. So you can
cat receiverTar1.tar receivedTar2.tar ... >>alltars.tar
tar -itvf alltars.tar
Also, your tar --concatenate
example ought to be working. However, if you have the same named file in several tar archives you will rewrite that file several times when you extract all from the resulting tar.