DataEncoding and compression in HDF5 format
The documentation is misleading here. On one hand, the only export option is "Append"
which can be found under the Options tab. On the other hand, the general documentation reads
I really wonder, why it is necessary to put Import only behind an option value when "DataEncoding"
isn't an export option at all.
Anyway, I have the same behaviour in MacOSX as you have: No data compression. Although, using
ExportString[{{1, 2}, {3, 4}}, "HDF5", "DataEncoding" -> "GZIP"]
and changing "GZIP"
to None
changes something in the output, but it does not compress the array.
Partial solution
One possible solution is to gzip the "HDF5" files. This seems to be recognised by Mathematica automatically. So either you manually gzip the files, or you use something along these lines to do everything in Mathematica
Export["matrix.h5.gz", ExportString[datapourrie, "HDF5"], "GZIP"]
For your test data this runs in no time, everything else needs probably a benchmarking and tweaking. To re-import your data you can simply do
Import["matrix.h5.gz", {"HDF5", "Datasets", "/Dataset1"}]
One workaround is to compress the HDF5 file after it has been exported from Mathematica, using the HDF5 command line tools.
Note: on OS X the command line tools can be easily installed using MacPorts using port install h5utils
.
The command to recompress the data is
h5repack -v -f GZIP=1 infile.h5 outfile.h5
This can indeed achieve a significant reduction in size.
For convenience you might want to invoke this from within Mathematica using Run
.
In version 7 halirutan's export method does not produce a file that is recognized by Import
.
However, one can write:
Export["matrix2.h5.gz", datapourrie, {"GZIP", "HDF5"}]
And then:
d2 = Import["matrix2.h5.gz", {"Datasets", "/Dataset1"}];
datapourrie == d2
True