Calculating block statistics from land cover raster using R?
You can use the aggregate
function in the raster
package to do this. For example, to produce a raster at half the resolution of the original, showing the proportion of cells covered with land class 1
:
aggregate(r, fact=2, fun=function(vals, na.rm) {
sum(vals==1, na.rm=na.rm)/length(vals)
})
To do this for all land classes:
cov_pct <- lapply(unique(r), function(land_class) {
aggregate(r, fact=2, fun=function(vals, na.rm) {
sum(vals==land_class, na.rm=na.rm)/length(vals)
})
})
I agree that raster::aggregate()
works well in most cases, especially when a user-defined function is involved as in the answer given by @dbaston. However, you might want to have a look at the velox package as an alternative approach that comes in particularly handy when dealing with larger rasters.
The only caveat here is that VeloxRaster_aggregate()
accepts "standard" aggregate functions (ie. c("sum", "mean", "min", "max", "median")
only –
any work steps beyond that level need to be carried out beforehand. Nevertheless, the resulting speed gain surely justifies the need for a little more code. In the example below, I increased the ncell()
of your input raster by a factor of 100 and compared runtimes. As seen from the benchmark results, velox outperforms the raster::aggregate()
based approach by a factor of 13, on average, when fed with the larger raster image.
library(raster)
r = raster(ncols = 1e3, nrows = 1e3
, xmn = 100, xmx = 1100, ymn = 0, ymx = 1000
, crs="+proj=lcc +lat_1=35 +lat_2=65 +lat_0=52 +lon_0=10 +x_0=4000000 +y_0=2800000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")
x = 0:5
set.seed(123) # make sampling reproducible
r[] <- sample(x, 1e6, replace = TRUE)
### benchmark ----
library(microbenchmark)
library(velox)
microbenchmark(
## raster approach
rasterAgg = {
lst1 = lapply(unique(r), function(land_class) {
aggregate(r, fact = 2, fun = function(vals, na.rm) {
sum(vals == land_class, na.rm = na.rm)/length(vals)
})
})
}
## velox approach
, veloxAgg = {
vls = r[]
lst2 = lapply(unique(r), function(i) {
# set the current land cover class to 1, everything else to 0
ids = (vls == i)
vls[ids] = 1; vls[!ids] = 0
rst = setValues(r, vls)
# create velox raster and aggregate
vlx = velox(rst)
vlx$aggregate(factor = 2, aggtype = "mean")
vlx$as.RasterLayer()
})
}
## repeat every chunk n times
, times = 10)
And here are the benchmark results:
Unit: milliseconds
expr min lq mean median uq max neval cld
rasterAgg 5844.2713 5950.7560 6140.8879 6186.1065 6354.8678 6370.6968 10 b
veloxAgg 319.2344 336.5698 472.3171 375.2693 622.2361 744.8573 10 a
You can also plot()
or subtract the single output layers in lst1
and lst2
from each other in order to verify that the two approaches create identical results.
As a reference: using your original 1e4 pixels raster as input, VeloxRaster_aggregate()
performs only 4 times faster on my machine (at least on my machine), which leads me to the conclusion that the raster based approach is absolutely suitable for smaller images. However, velox is definitely worth a deeper look when dealing with larger datasets that still fit well into memory.