Measure peak memory usage in R
I found what I was looking for in the package peakRAM
. From the documentation:
This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.
mem <- peakRAM({
for(i in 1:5) {
mean(rnorm(1e7))
}
})
mem$Peak_RAM_Used_MiB # 10000486MiB
mem <- peakRAM({
for(i in 1:5) {
mean(rnorm(1e7))
}
})
mem$Peak_RAM_Used_MiB # 10005266MiB <-- almost the same!
The object returned by lapply
weights only 488 bytes because it's summarized : garbage collection has deleted the intermediate objects after mean calculation.help('Memory')
gives useful information on how R manages memory.
In particular, you can use object.size()
to follow-up size of individual objects, and memory.size()
to know how much total memory is used at each step :
# With mean calculation
gc(reset = T)
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 405777 21.7 831300 44.4 405777 21.7
#> Vcells 730597 5.6 8388608 64.0 730597 5.6
sum(gc()[, "(Mb)"])
#> [1] 27.3
l<-lapply(1:3, function(x) {
mx <- replicate(10, rnorm(1e6)) # 80Mb object
mean(mx)
print(paste('Memory used:',memory.size()))
})
#> [1] "Memory used: 271.04"
#> [1] "Memory used: 272.26"
#> [1] "Memory used: 272.26"
object.size(l)
#> 488 bytes
## Without mean calculation :
gc(reset = T)
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 464759 24.9 831300 44.4 464759 24.9
#> Vcells 864034 6.6 29994700 228.9 864034 6.6
gcinfo(T)
#> [1] FALSE
sum(gc()[, "(Mb)"])
#> [1] 31.5
l<-lapply(1:4, function(x) {
mx <- replicate(10, rnorm(1e6))
print(paste('New object size:',object.size(mx)))
print(paste('Memory used:',memory.size()))
mx
})
#> [1] "New object size: 80000216"
#> [1] "Memory used: 272.27"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 348.58"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 424.89"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 501.21"
object.size(l)
#> 320000944 bytes
sum(gc()[, "(Mb)"])
#> [1] 336.7
Created on 2020-08-20 by the reprex package (v0.3.0)
If instead of returning mean
you return the whole object, the increase in memory use is significant.
You can use the gc
function for that.
Indeed, the gc
function provides the current and maximum memory used within the fields 11 and 12 (in Mb
regarding the documentation, but obviously in Mio
in practice on my machine). You can reset the maximum value with the parameter reset=TRUE
. Here is an example:
> gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 318687 17.1 654385 35.0 318687 17.1
Vcells 629952 4.9 397615688 3033.6 629952 4.9
> a = runif(1024*1024*64) # Should request 512 Mio to the GC (on my machine)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 318677 17.1 654385 35.0 318834 17.1
Vcells 67738785 516.9 318092551 2426.9 67739236 516.9
> memInfo <- gc()
> memInfo[11] # Maximum Ncells
[1] 17.1
> memInfo[12] # Maximum Vcells
[1] 516.9
> rm(a) # `a` can be removed by the GC from this point
> gc(reset=TRUE) # Order to reset the GC infos including the maximum
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 318858 17.1 654385 35.0 318858 17.1
Vcells 630322 4.9 162863387 1242.6 630322 4.9
> memInfo <- gc()
> memInfo[11]
[1] 17.1
> memInfo[12] # The maximum has been correctly reset
[1] 4.9
In this example we can see that up to 516.9 - 4.9 = 512 Mb
has been allocated by the GC between the two gc calls surrounding the runif
call (which is consistent with the expected result).