Best practices for storing and using data frames too large for memory?
You probably want to look at these packages:
- ff for 'flat-file' storage and very efficient retrieval (can do data.frames; different data types)
- bigmemory for out-of-R-memory but still in RAM (or file-backed) use (can only do matrices; same data type)
- biglm for out-of-memory model fitting with
lm()
andglm()
-style models.
and also see the High-Performance Computing task view.
I would say the disk.frame is good candidate for these type of tasks. I am the primary author of the package.
Unlike ff
and bigmemory
which restricts what data types can be easily handled, it tries to "mimic" data.frame
s and provide dplyr
verbs for manipulating the data.