How parallelize the extract function for raster files in R?

You need to use the beginCluster and endCluster functions of the raster package. See the example below.

library(raster)
library(snow)

# Make test data
# RasterStack
r <- raster(ncol=36, nrow=18)
r[] <- 1:ncell(r)
s <- stack(r, sqrt(r), r/r)

# SpatialPolygons
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- spPolygons(cds1, cds2)

# Visualize
plot(s, 1); plot(polys, add = TRUE)

# Extract
beginCluster(n=2)
extract(s, polys)
endCluster()

However, most of the processing time is probably spent on rasterizing the polygons, and that part is not parallelized and known to be quite inefficient. There are alternative packages to speed up that step. See velox and fasterize.


Given that it's .tif we now need to know if it's tiled. It probably is, and raster extract is very slow in this situation (and is effectively in an un-maintained state with no known prospect of improvement).

I would lapply(filenames, function(x) extract(readAll(raster(x)), ts.poly)) - but that's still going to do the geometry look up every layer, so it's best to flatten the cell-index to one column with a grouping for each polygon. That's what this does:

https://github.com/hypertidy/tabularaster

I'm still guessing because we can't reproduce your situation, and it's very open-ended, but untested code I'd try is

library(tabularaster)  ## devtools::install_github("hypertidy/tabularaster")

 ## mapping between polygon `object_` and `cell_` number as per
 ## extract(..., cellnumbers = TRUE) / cellFrom* / extract(x, cells)
cells <- cellnumbers(stack.ts[[1]], ts.poly) 
exvalues <- lapply(seq_len(nlayers(stack.ts)), function(i) extract(readAll(stack.ts[[i]]), cells$cell_))

Then exvalues can be do.call(cbind, exvalues) into the matrix you would have otherwise gotten the high-level way.

I wouldn't normally write the code that way, I'd loop over the file names probably but it's too open-ended to cover all possibilities. I'm sorry not to explain everything in detail, this is sadly a topic not oft discussed and so the tools are capable, just not well understood and have a bunch of problems.