Where is HDF5Tools documentation?

I believe this package contains internal functions, used by Import when working with HDF5 files. These functions have no documentation, and aren't meant to be called from top level.

If you evaluate

Needs["HDF5Tools`"]; 
?HDF5Tools`*

and click any of the symbols, you'll see most have usage messages. So you may be able to figure out what they do, but the documented method is to call Import.

If you need data that is in the HDF5 file that isn't being returned by Import, it would be helpful to send this request to customer support, so that the importing can be improved.


Considering the question about the documentation Jasons answer makes clear that there basically is none (except for the usage messages) and the HDF5Tools` package at this time (2019) unfortunately seems to be meant for internal use only.

Documented Import/Export Functionality

You have been stating that you would want to use that functionality to read only parts of an HDF5 file and append data to an existing one. I have mentioned in my comments that I think that you might be able to do what you ask for already with documented functionality. The following should work since at least version 11.2:

Create a file with a small and a large dataset:

large = RandomReal[{0, 255}, 10^8];
Export["test.h5", {"large" -> large, "small" -> Range[20]}];

Now (if you want to check memory usage) after quitting the kernel you can get only part of the file content:

small = Import["test.h5", {"Datasets", "small"}];
part = Import["test.h5", {"Datasets", "large"}, "TakeElements" -> {1 ;; 10}];

You can check the both will not import the whole file into the Mathematica kernel. You can also add ("append") new datasets to an existing file (but not change, replace or append to existing datasets, AFAIK):

Export["test.h5", part, {"Datasets", "part"}, "Append" -> True]

Of course I don't know whether this documented functionality is good enough for what you need, but it might be. I understand that there is still a lot of functionality missing in the documented Import/Export functionality. I haven't checked with version 12 yet, but performance might also be not as good as it could be, especially the memory usage in older versions has been unreasonably high. I have complained about that on this site and elsewhere before. So I definitely see that a documentated way to make use of the HDF5Tools` would be useful and would appreciate any better answers to the actual question about the documentation.

Spelunking

First of all it might be interesting to know that the HDFTools` package already exists and works in version 11.2 and is not new to 12.0.

Looking at the package context, it looks like there are some higher level functions in the toolbox that start with HDF5 and allow a relatively simple programmatic access to HDF5 files and many are easy enough to use with just the usage messages. But these function probably do not provide much more than what the Import/Export functions also provide. So it seems relatively uninteresting to delve deeper into those at the moment.

Additionally there seems to be a quite complete set of lower level functions which are more or less direct correspondents to the documented HDF5 C-library functions (see HDF5 library documentation for details). These lower level functions all start with h5 and are one-to-one matches to the corresponding C-functions except for the missing _ name separators. So with the usage messages and the documentation of the HDF5 library I think it should be possible to achieve quite a large subset of what is possible with HDF5 using this low-level-functions.

High(er) Level Interface

Here is an example of how you could use them to read parts of a dataset:

Needs["HDF5Tools`"]
fh = HDF5OpenFile[filename, H5FACCRDONLY]
dh = HDF5OpenDataset[fh, "/small"]
data = HDF5ReadDataset[dh, "TakeElements" -> Range[5]]
HDF5CloseDataset[dh]
HDF5CloseFile[fh]

Lower Level Interface

The differences between these Mathematica functions and the documented HDF5-C-Library functions seem to be mainly that these functions return arrays instead of filling allocated buffers given as additional arguments as usual in C.

Here is an example of how to use that lower-level interface, which was more or less a direct translation of a C-LibraryFunction I wrote a few years ago to examine the possibilities of faster access to HDF5 files (here I read the second half of the /small dataset from the above example file):

HDF5ToolsInit[];
fid = h5fopen[filename, H5FACCRDONLY];
did = h5dopen[fid, "small", H5PDEFAULT];
fsid = h5dgetspace[did];
rank = h5sgetsimpleextentndims[fsid];
msid = h5screatesimple[rank, {10}, {10}];
h5sselecthyperslab[msid, H5SSELECTSET, {0}, {1}, {10}, {1}];
h5sselecthyperslab[fsid, H5SSELECTSET, {10}, {1}, {10}, {1}];
data = h5dread[did, H5TNATIVEDOUBLE, msid, fsid, H5PDEFAULT];
h5sclose[msid];
h5sclose[fsid];
h5dclose[did];
h5fclose[fid];
Normal[data]

As you can see, the lower level interface is quite verbose and unconvenient and it needs some familiarity with the HDF5 library to achieve anything nontrivial. As has to be expected for non-documented functionality there have been changes between version 11.2 and 12, e.g. the data is returned as a NumericArray in version 12. On the other hand these differences seem to be small and it looks like the HDF5Tools` are implemented as LibraryFunctions which should give good performance without extra copies of data, so I would expect these to show good performance even with larger datasets. They are of course not nearly as convenient to use as the h5py library...

Finally here is a piece of code which does something that would not be possible with the existing Import/Export functionality: we overwrite a part of a dataset with new values:

HDF5ToolsInit[]
fid = h5fopen[filename, H5FACCRDWR]
did = h5dopen[fid, "small", H5PDEFAULT]
fsid = h5dgetspace[did]
rank = h5sgetsimpleextentndims[fsid]
msid = h5screatesimple[rank, {10}, {10}]
h5sselecthyperslab[msid, H5SSELECTSET, {0}, {1}, {10}, {1}];
h5sselecthyperslab[fsid, H5SSELECTSET, {10}, {1}, {10}, {1}];
h5dwrite[did, H5TNATIVEDOUBLE, msid, fsid, H5PDEFAULT, Table[k, {k, 100, 109}]]
h5sclose[msid];
h5sclose[fsid];
h5dclose[did];
h5fclose[fid];

One can check with e.g.

Import[filename, {"Datasets", "/small"}]

that this has indeed worked as expected.

I hope that this demonstrates that with some familiarity with the HDF5 library it would probably be relatively straightforward to handle HDF5 files in avery general way and very efficiently with the HDF5Tools` package...

Tags:

Hdf5 Format