Increasing speed of crop, mask, & extract raster by many polygons in R?
I have finally gotten around to improving this function. I found that for my purposes, it was fastest to rasterize()
the polygon first and use getValues()
instead of extract()
. The rasterizing isn't much faster than the original code for tabulating raster values in small polygons, but it shines when it came to large polygon areas that had large rasters to be cropped and the values extracted. I also found getValues()
was much faster than the extract()
function.
I also figured out the multi-core processing using foreach()
.
I hope this is useful for other people who want an R solution for extracting raster values from a polygon overlay. This is similar to the "Tabulate Intersection" of ArcGIS, which did not work well for me, returning empty outputs after hours of processing, like this user.
#initiate multicore cluster and load packages
library(foreach)
library(doParallel)
library(tcltk)
library(sp)
library(raster)
cores<- 7
cl <- makeCluster(cores, output="") #output should make it spit errors
registerDoParallel(cl)
Here's the function:
multicore.tabulate.intersect<- function(cores, polygonlist, rasterlayer){
foreach(i=1:cores, .packages= c("raster","tcltk","foreach"), .combine = rbind) %dopar% {
mypb <- tkProgressBar(title = "R progress bar", label = "", min = 0, max = length(polygonlist[[i]]), initial = 0, width = 300)
foreach(j = 1:length(polygonlist[[i]]), .combine = rbind) %do% {
final<-data.frame()
tryCatch({ #not sure if this is necessary now that I'm using foreach, but it is useful for loops.
single <- polygonlist[[i]][j,] #pull out individual polygon to be tabulated
dir.create (file.path("c:/rtemp",i,j,single@data$OWNER), showWarnings = FALSE) #creates unique filepath for temp directory
rasterOptions(tmpdir=file.path("c:/rtemp",i,j, single@data$OWNER)) #sets temp directory - this is important b/c it can fill up a hard drive if you're doing a lot of polygons
clip1 <- crop(rasterlayer, extent(single)) #crop to extent of polygon
clip2 <- rasterize(single, clip1, mask=TRUE) #crops to polygon edge & converts to raster
ext <- getValues(clip2) #much faster than extract
tab<-table(ext) #tabulates the values of the raster in the polygon
mat<- as.data.frame(tab)
final<-cbind(single@data$OWNER,mat) #combines it with the name of the polygon
unlink(file.path("c:/rtemp",i,j,single@data$OWNER), recursive = TRUE,force = TRUE) #delete temporary files
setTkProgressBar(mypb, j, title = "number complete", label = j)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")}) #trycatch error so it doesn't kill the loop
return(final)
}
#close(mypb) #not sure why but closing the pb while operating causes it to return an empty final dataset... dunno why.
}
}
So to use it, adjust the single@data$OWNER
to fit with the column name of your identifying polygon (guess that could have been built into the function...) and put in:
myoutput <- multicore.tabulate.intersect(cores, polygonlist, rasterlayer)
Speed up extracting raster (raster stack) from point, XY or Polygon
Great answer Luke. You must be a R wizard! Here is a very minor tweak to simplify your code (may improve performance slightly in some cases). You can avoid some operations by using cellFromPolygon (or cellFromXY for points) and then clip and getValues.
Extract polygon or points data from raster stacks ------------------------
library(raster)
library(sp)
# create polygon for extraction
xys= c(76.27797,28.39791,
76.30543,28.39761,
76.30548,28.40236,
76.27668,28.40489)
pt <- matrix(xys, ncol=2, byrow=TRUE)
pt <- SpatialPolygons(list(Polygons(list(Polygon(pt)), ID="a")));
proj4string(pt) <-"+proj=longlat +datum=WGS84 +ellps=WGS84"
pt <- spTransform(pt, CRS("+proj=sinu +a=6371007.181 +b=6371007.181 +units=m"))
## Create a matrix with random data & use image()
xy <- matrix(rnorm(4448*4448),4448,4448)
plot(xy)
# Turn the matrix into a raster
NDVI_stack_h24v06 <- raster(xy)
# Give it lat/lon coords for 36-37°E, 3-2°S
extent(NDVI_stack_h24v06) <- c(6671703,7783703,2223852,3335852)
# ... and assign a projection
projection(NDVI_stack_h24v06) <- CRS("+proj=sinu +a=6371007.181 +b=6371007.181 +units=m")
plot(NDVI_stack_h24v06)
# create a stack of the same raster
NDVI_stack_h24v06 = stack( mget( rep( "NDVI_stack_h24v06" , 500 ) ) )
# Run functions on list of points
registerDoParallel(16)
ptm <- proc.time()
# grab cell number
cell = cellFromPolygon(NDVI_stack_h24v06, pt, weights=FALSE)
# create a raster with only those cells
r = rasterFromCells(NDVI_stack_h24v06, cell[[1]],values=F)
result = foreach(i = 1:dim(NDVI_stack_h24v06)[3],.packages='raster',.combine=rbind,.inorder=T) %dopar% {
#get value and store
getValues(crop(NDVI_stack_h24v06[[i]],r))
}
proc.time() - ptm
endCluster()
user system elapsed 16.682 2.610 2.530
registerDoParallel(16)
ptm <- proc.time()
result = foreach(i = 1:dim(NDVI_stack_h24v06)[3],.packages='raster',.inorder=T,.combine=rbind) %dopar% {
clip1 <- crop(NDVI_stack_h24v06[[i]], extent(pt)) #crop to extent of polygon
clip2 <- rasterize(pt, clip1, mask=TRUE) #crops to polygon edge & converts to raster
getValues(clip2) #much faster than extract
}
proc.time() - ptm
endCluster()
user system elapsed 33.038 3.511 3.288
If a loss in the precision of the overlay is not terribly important - assuming it is precise to begin with - one can typically achieve much greater zonal calculation speeds by first converting the polygons to a raster. The raster
package contains the zonal()
function, which should work well for the intended task. However, you can always subset two matrices of the same dimension using standard indexing. If you must maintain polygons and you don't mind GIS software, QGIS is actually must faster at zonal statistics than either ArcGIS or ENVI-IDL.