How to manage memory in agent-based modeling with R
From my understanding bigmemory
just works on matrices and not multi-dimensional arrays, but you could save a multidimensional array as a list of matrices.
gc
is just the garbace collector and you don't really have to call it, since it will be called automatically, but the manual also states:
It can be useful to call gc after a large object has been removed, as this may prompt R to return memory to the operating system.
I think the most useful package for you're task would be ff
.
Here's a short example to illustrate the strength of the package ff
, which stores data on disk and almost doesn't affect memory.
Initialization arrays with base-R:
p <- array(NA, dim=c(1825, 38, 1000),
dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))))
format(object.size(p), units="Mb")
"264.6 Mb"
So in total, your initial arrays would take almost up to 5GB memory already, which will get you in trouble with heavy computation.
Initialization arrays with ff:
library(ff)
myArr <- ff(NA, dim=c(1825, 38, 1000),
dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))),
filename="arr.ffd", vmode="logical", overwrite = T)
format(object.size(myArr), units="Mb")
[1] "0.1 Mb"
Test for equality:
euqals <- list()
for (i in 1:dim(p)[1]) {
euqals[[i]] <- all.equal(p[i,,],
myArr[i,,])
}
all(unlist(euqals))
[1] TRUE
Is there any reason why you have to stick to array data type?
If there are many NAs present in your arrays then it means you are using more memory than you really need. This is the downside of arrays in R.
If the operations you are performing do not necessarily require your data to be arrays then you should save some memory by remodelling it as data.frame.
Below example shows how you're data.frame could look like after transforming from array. Note that I had to explicitly use na.rm=FALSE
, otherwise the result would be 0 rows data.
devtools::install_github("Rdatatable/[email protected]")
library(data.table)
p <- array(NA, dim=c(1825, 38, 1000),
dimnames=list(NULL, NULL, as.character(seq(1, 1000, 1))))
as.data.table(p, na.rm=FALSE)
# V1 V2 V3 value
# <int> <int> <char> <lgcl>
# 1: 1 1 1 NA
# 2: 1 1 10 NA
# 3: 1 1 100 NA
# 4: 1 1 1000 NA
# 5: 1 1 101 NA
The alternative way is to use data.cube package. It will basically do what I wrote above for you behind the scene. You still have array's [
operator, but data.cube objects won't work with R functions that expects array on input, as they will coerce data.cube to array loosing all memory benefits.
Memory benefits can be significant, example in data.cube vignette:
array: 34.13 GB
data.cube: 0.01 GB