Best practices for storing and using data frames too large for memory?

You probably want to look at these packages:

  • ff for 'flat-file' storage and very efficient retrieval (can do data.frames; different data types)
  • bigmemory for out-of-R-memory but still in RAM (or file-backed) use (can only do matrices; same data type)
  • biglm for out-of-memory model fitting with lm() and glm()-style models.

and also see the High-Performance Computing task view.


I would say the disk.frame is good candidate for these type of tasks. I am the primary author of the package.

Unlike ff and bigmemory which restricts what data types can be easily handled, it tries to "mimic" data.frames and provide dplyr verbs for manipulating the data.