DataEncoding and compression in HDF5 format

The documentation is misleading here. On one hand, the only export option is "Append" which can be found under the Options tab. On the other hand, the general documentation reads

enter image description here

I really wonder, why it is necessary to put Import only behind an option value when "DataEncoding" isn't an export option at all.

Anyway, I have the same behaviour in MacOSX as you have: No data compression. Although, using

ExportString[{{1, 2}, {3, 4}}, "HDF5", "DataEncoding" -> "GZIP"]

and changing "GZIP" to None changes something in the output, but it does not compress the array.

Partial solution

One possible solution is to gzip the "HDF5" files. This seems to be recognised by Mathematica automatically. So either you manually gzip the files, or you use something along these lines to do everything in Mathematica

Export["matrix.h5.gz", ExportString[datapourrie, "HDF5"], "GZIP"]

For your test data this runs in no time, everything else needs probably a benchmarking and tweaking. To re-import your data you can simply do

Import["matrix.h5.gz", {"HDF5", "Datasets", "/Dataset1"}]

One workaround is to compress the HDF5 file after it has been exported from Mathematica, using the HDF5 command line tools.

Note: on OS X the command line tools can be easily installed using MacPorts using port install h5utils.

The command to recompress the data is

h5repack -v -f GZIP=1 infile.h5 outfile.h5

This can indeed achieve a significant reduction in size.

For convenience you might want to invoke this from within Mathematica using Run.


In version 7 halirutan's export method does not produce a file that is recognized by Import.

However, one can write:

Export["matrix2.h5.gz", datapourrie, {"GZIP", "HDF5"}]

And then:

d2 = Import["matrix2.h5.gz", {"Datasets", "/Dataset1"}];

datapourrie == d2

True