How to develop an Import/Export converter for Compress[]ed data?
In this case, developing the converters is dead-easy (which is not a good thing IMO, since it means that we really don't utilize the power of Import/Export
framework, but rather are adding syntactic sugar):
CompressedFormat`CompressedFormatImport[filename_String, options___] :=
{"Data" -> Uncompress@Import[filename, "String"]};
CompressedFormat`CompressedFormatExport[filename_String, data_, opts___] :=
Export[filename, Compress@data, "String"];
ImportExport`RegisterImport[
"CompressedFormat",
CompressedFormat`CompressedFormatImport
]
ImportExport`RegisterExport[
"CompressedFormat",
CompressedFormat`CompressedFormatExport
]
Example:
file = $TemporaryPrefix <> "test";
Export[file, Range[1000000], "CompressedFormat"];
Import[file, {"CompressedFormat", "Data"}] // Length
(*
==> 1000000
*)
That said, I think using Import
- Export
framework makes much more sense for specific formats where you can specify distinct elements and the framework makes it convenient to create importers for those elements (possibly avoiding full imports when unnecessary). So, for a meaningful exposition of the importer-writing procedure using Import/Export
framework, some e.g. particular graphics of numerical format would be a better choice IMO, because your stated goal is too general for that.
For that matter, I think that my large data framework (perhaps when extended and generalized) will make for a much better case for Import/Export
framework use, as well as cover your use case and many more, because it:
- Does use
Compress
under the cover - Uses lazy loading, which opens many possibilities to define certain elements for
Import/Export
, which are loaded individually / efficiently - Does not have a limitation that the file must fit in memory
- Can be very fast for large files
- In practice, we use large files much more frequently than carry them around from platform to platform. My framework can switch from extremely fast .mx files to
Compress
-ed non-.mx files very easily, and the details can be completely hidden from the user, who will just useImport
in all cases, and have great performance.
In other words, I feel that the direction I outlined there, does contain your suggestion as a special case, and is much more fruitful both for further development of the large-data framework / file format, and for the utilization of the power of the Import/Export
framework (and, sure enough, this is the direction I will be extending the large-data framework in the future).
One simple way to store data in compressed form could use the following:
ExportCompressed[filename_,data_]:=
Export[filename,"Uncompress@"<>"\""<>Compress[data]<>"\"","String"]
This simply compresses and prepends the Uncompress
statement to the resulting string. You can now simply use Get[] to import your data.
I use this to store compressed graphics expressions. Compressing can take a long time (I´d like to see that sped up big time, because several minutes for a few MB of graphics expression is way too long), but mostly you get very good compression.
On the other hand, import of these expressions is really fast. This seems kind of related to the WDX performance.