How to develop an Import/Export converter for Compress[]ed data?

In this case, developing the converters is dead-easy (which is not a good thing IMO, since it means that we really don't utilize the power of Import/Export framework, but rather are adding syntactic sugar):

CompressedFormat`CompressedFormatImport[filename_String, options___] :=
    {"Data" -> Uncompress@Import[filename, "String"]};

CompressedFormat`CompressedFormatExport[filename_String, data_, opts___] :=
    Export[filename, Compress@data, "String"];

ImportExport`RegisterImport[
   "CompressedFormat",
   CompressedFormat`CompressedFormatImport
]

ImportExport`RegisterExport[
   "CompressedFormat", 
   CompressedFormat`CompressedFormatExport 
]

Example:

file = $TemporaryPrefix <> "test";
Export[file, Range[1000000], "CompressedFormat"];
Import[file, {"CompressedFormat", "Data"}] // Length

(* 
  ==>  1000000
*)

That said, I think using Import - Export framework makes much more sense for specific formats where you can specify distinct elements and the framework makes it convenient to create importers for those elements (possibly avoiding full imports when unnecessary). So, for a meaningful exposition of the importer-writing procedure using Import/Export framework, some e.g. particular graphics of numerical format would be a better choice IMO, because your stated goal is too general for that.

For that matter, I think that my large data framework (perhaps when extended and generalized) will make for a much better case for Import/Export framework use, as well as cover your use case and many more, because it:

  • Does use Compress under the cover
  • Uses lazy loading, which opens many possibilities to define certain elements for Import/Export, which are loaded individually / efficiently
  • Does not have a limitation that the file must fit in memory
  • Can be very fast for large files
  • In practice, we use large files much more frequently than carry them around from platform to platform. My framework can switch from extremely fast .mx files to Compress-ed non-.mx files very easily, and the details can be completely hidden from the user, who will just use Import in all cases, and have great performance.

In other words, I feel that the direction I outlined there, does contain your suggestion as a special case, and is much more fruitful both for further development of the large-data framework / file format, and for the utilization of the power of the Import/Export framework (and, sure enough, this is the direction I will be extending the large-data framework in the future).


One simple way to store data in compressed form could use the following:

ExportCompressed[filename_,data_]:=
    Export[filename,"Uncompress@"<>"\""<>Compress[data]<>"\"","String"]

This simply compresses and prepends the Uncompress statement to the resulting string. You can now simply use Get[] to import your data.

I use this to store compressed graphics expressions. Compressing can take a long time (I´d like to see that sped up big time, because several minutes for a few MB of graphics expression is way too long), but mostly you get very good compression.

On the other hand, import of these expressions is really fast. This seems kind of related to the WDX performance.

Tags:

Import

Export