How to speed up raster to polygon conversion in R?
Gdal Installation
Install Gdal command line tools and check to see if its binaries are added to path
environment variable. e.g. in windows: open Run
and type:
rundll32.exe sysdm.cpl,EditEnvironmentVariables
Then follow the screenshot
Download and install gdal python bindings from here according to your python and OS.
install it using:
pip.exe install GDAL-2.0.2-cp27-none-win32.whl
You may encounter issues while installing gdal. Please see Installing GDAL with Python on windows?
In that thread users have suggested that Gdal binaries from gisinternals installs both the command line tools and the python bindings. Try to install it from there. Thus none of the above steps would be relevant.
To Make sure Gdal is installed open a command prompt and type:
ogrinfo
and check it works and you don't get:
'ogrinfo' is not recognized as an internal or external command, operable program or batch file.
To check gdal python bindings is installed open command prompt and type:
python
import gdal
if you get the following error then it is not installed:
Traceback (most recent call last): File "", line 1, in ImportError: No module named gdal
R side
Source the following r function from John Baumgartner blog:
gdal_polygonizeR <- function(x, outshape=NULL, gdalformat = 'ESRI Shapefile',
pypath=NULL, readpoly=TRUE, quiet=TRUE) {
if (isTRUE(readpoly)) require(rgdal)
if (is.null(pypath)) {
pypath <- Sys.which('gdal_polygonize.py')
}
if (!file.exists(pypath)) stop("Can't find gdal_polygonize.py on your system.")
owd <- getwd()
on.exit(setwd(owd))
setwd(dirname(pypath))
if (!is.null(outshape)) {
outshape <- sub('\\.shp$', '', outshape)
f.exists <- file.exists(paste(outshape, c('shp', 'shx', 'dbf'), sep='.'))
if (any(f.exists))
stop(sprintf('File already exists: %s',
toString(paste(outshape, c('shp', 'shx', 'dbf'),
sep='.')[f.exists])), call.=FALSE)
} else outshape <- tempfile()
if (is(x, 'Raster')) {
require(raster)
writeRaster(x, {f <- tempfile(fileext='.tif')})
rastpath <- normalizePath(f)
} else if (is.character(x)) {
rastpath <- normalizePath(x)
} else stop('x must be a file path (character string), or a Raster object.')
system2('python', args=(sprintf('"%1$s" "%2$s" -f "%3$s" "%4$s.shp"',
pypath, rastpath, gdalformat, outshape)))
if (isTRUE(readpoly)) {
shp <- readOGR(dirname(outshape), layer = basename(outshape), verbose=!quiet)
return(shp)
}
return(NULL)
}
use this command to do the job:
r.to.poly<-gdal_polygonizeR(r1,pypath = "C:\\Program Files\\GDAL\\gdal_polygonize.py")#, dissolve = T)
Change pypath
parameter according to your system. Be cautious that gdal_plygonize
creates huge shapefiles. My 1 MB tif converted to a 128 MB
shapefile. R needs a lot of memory to open this shapefile. Although the conversion was very fast. Thanks to python and gdal!
Another option would be Esri r-bridge
to do the computation in Arcgis and return the output to R. However r-bridge doesn't support raster layers yet. (Thanks to @JeffreyEvans)
However the Gdal method is commercial free!
There is a "new" method from the stars
package, which revolutionized the workflow for me (I was using the gdal_polygonizeR
function previously). It has been faster than the John Baumgartner solution for all complicated rasters I have tried it on. Further, it does not require the gdal_polygonize.py
script, which can be difficult to install on some machines. Try:
r.to.poly <- sf::as_Spatial(sf::st_as_sf(stars::st_as_stars(r1),
as_points = FALSE, merge = TRUE)
) # requires the sf, sp, raster and stars packages
See also this thread.
PS. If your polygons are complex, you probably need to validate them, before you can use them to calculate area:
rgeos::gIsValid(r.to.poly) # FALSE here means that you'll need to run the buffer routine:
r.to.poly <- rgeos::gBuffer(r.to.poly, byid = TRUE, width = 0)
If the polygons are very complex, you may need to add buffer width to get rid of all crossing edges:
r.to.poly <- rgeos::gBuffer(r.to.poly, byid = TRUE, width = 1000)
r.to.poly <- rgeos::gBuffer(r.to.poly, byid = TRUE, width = -1000)
Further, raster::area
is quicker also for polygons than rgeos::gArea
, I think (have not tested this properly, though).