Decompress tar files using C#

While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz file to disk.

While the gz format could be considered rather complicated, tar on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz archive, which .NET conveniently has built-in, which takes care of all the hard part.

Having looked at the spec for the tar format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name, and the second is size. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.

I made a very rudimentary, down-and-dirty method to extract a tar archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz file first using built-in functions.

The primary method is this:

public static void ExtractTar(Stream stream, string outputDir)
{
    var buffer = new byte[100];
    while (true)
    {
        stream.Read(buffer, 0, 100);
        var name = Encoding.ASCII.GetString(buffer).Trim('\0');
        if (String.IsNullOrWhiteSpace(name))
            break;
        stream.Seek(24, SeekOrigin.Current);
        stream.Read(buffer, 0, 12);
        var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);

        stream.Seek(376L, SeekOrigin.Current);

        var output = Path.Combine(outputDir, name);
        if (!Directory.Exists(Path.GetDirectoryName(output)))
            Directory.CreateDirectory(Path.GetDirectoryName(output));
        using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
        {
            var buf = new byte[size];
            stream.Read(buf, 0, buf.Length);
            str.Write(buf, 0, buf.Length);
        }

        var pos = stream.Position;

        var offset = 512 - (pos  % 512);
        if (offset == 512)
            offset = 0;

        stream.Seek(offset, SeekOrigin.Current);
    }
}

And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz file/stream before extracting.

public static void ExtractTarGz(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTarGz(stream, outputDir);
}

public static void ExtractTarGz(Stream stream, string outputDir)
{
    // A GZipStream is not seekable, so copy it first to a MemoryStream
    using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
    {
        const int chunk = 4096;
        using (var memStr = new MemoryStream())
        {
            int read;
            var buffer = new byte[chunk];
            do
            {
                read = gzip.Read(buffer, 0, chunk);
                memStr.Write(buffer, 0, read);
            } while (read == chunk);

            memStr.Seek(0, SeekOrigin.Begin);
            ExtractTar(memStr, outputDir);
        }
    }
}

public static void ExtractTar(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTar(stream, outputDir);
}

Here is a gist of the full file with some comments.


Tar-cs will do the job, but it is quite slow. I would recommend using SharpCompress which is significantly quicker. It also supports other compression types and it has been updated recently.

using System;
using System.IO;
using SharpCompress.Common;
using SharpCompress.Reader;

private static String directoryPath = @"C:\Temp";

public static void unTAR(String tarFilePath)
{
    using (Stream stream = File.OpenRead(tarFilePath))
    {
        var reader = ReaderFactory.Open(stream);
        while (reader.MoveToNextEntry())
        {
            if (!reader.Entry.IsDirectory)
            {
                ExtractionOptions opt = new ExtractionOptions {
                    ExtractFullPath = true,
                    Overwrite = true
                };
                reader.WriteEntryToDirectory(directoryPath, opt);
            }
        }
    }
}

See tar-cs

using (FileStream unarchFile = File.OpenRead(tarfile))
{
    TarReader reader = new TarReader(unarchFile);
    reader.ReadToEnd("out_dir");
}

Tags:

C#

.Net

Tar