How to speed up the plotting of polygons in R?
I found 3 ways to increase the speed of plotting the country borders from shape files for R. I found some inspiration and code from here and here.
(1) We can extract the coordinates from the shape files to get the longitude and latitudes of the polygons. Then we can put them into a data frame with the first column containing the longitudes and the second column containing latitudes. The different shapes are separated by NAs.
(2) We can remove some polygons from our shape file. The shape file is very, very detailed, but some of the shapes are tiny islands that are unimportant (for my plots, anyways). We can set a minimum polygon area threshold to keep the bigger polygons.
(3) We can simplify the geometry of our shapes using the Douglas-Peuker algorithm. The edges of our polygon shapes can be simplified, as they are very intricate in the original file. Fortunately, there is a package, rgeos
, that implements this.
Set up:
# Load packages
library(rgdal)
library(raster)
library(sp)
library(rgeos)
# Load the shape files
can<-getData('GADM', country="CAN", level=0)
usa<-getData('GADM', country="USA", level=0)
mex<-getData('GADM', country="MEX", level=0)
Method 1: Extract the coordinates from the the shape files into a data frame and plot lines
The major disadvantage is that we lose some information here when compared to keeping the object as a SpatialPolygonsDataFrame object, such as the projection. However, we can turn it back into an sp object and add back the projection information, and it is still faster than plotting the original data.
Note that this code runs very slowly on the original file because there are a lot of shapes, and the resulting data frame is ~2 million rows long.
Code:
# Convert the polygons into data frames so we can make lines
poly2df <- function(poly) {
# Convert the polygons into data frames so we can make lines
# Number of regions
n_regions <- length(poly@polygons)
# Get the coords into a data frame
poly_df <- c()
for(i in 1:n_regions) {
# Number of polygons for first region
n_poly <- length(poly@polygons[[i]]@Polygons)
print(paste("There are",n_poly,"polygons"))
# Create progress bar
pb <- txtProgressBar(min = 0, max = n_poly, style = 3)
for(j in 1:n_poly) {
poly_df <- rbind(poly_df, NA,
poly@polygons[[i]]@Polygons[[j]]@coords)
# Update progress bar
setTxtProgressBar(pb, j)
}
close(pb)
print(paste("Finished region",i,"of",n_regions))
}
poly_df <- data.frame(poly_df)
names(poly_df) <- c('lon','lat')
return(poly_df)
}
Method 2: Remove small polygons
There are many small islands that are not very important. If you check some of the quantiles of the areas for the polygons, we see that many of them are miniscule. For the Canada plot, I went down from plotting over a thousand polygons to just hundreds of polygons.
Quantiles for the size of polygons for Canada:
0% 25% 50% 75% 100%
4.335000e-10 8.780845e-06 2.666822e-05 1.800103e-04 2.104909e+02
Code:
# Get the main polygons, will determine by area.
getSmallPolys <- function(poly, minarea=0.01) {
# Get the areas
areas <- lapply(poly@polygons,
function(x) sapply(x@Polygons, function(y) y@area))
# Quick summary of the areas
print(quantile(unlist(areas)))
# Which are the big polygons?
bigpolys <- lapply(areas, function(x) which(x > minarea))
length(unlist(bigpolys))
# Get only the big polygons and extract them
for(i in 1:length(bigpolys)){
if(length(bigpolys[[i]]) >= 1 && bigpolys[[i]] >= 1){
poly@polygons[[i]]@Polygons <- poly@polygons[[i]]@Polygons[bigpolys[[i]]]
poly@polygons[[i]]@plotOrder <- 1:length(poly@polygons[[i]]@Polygons)
}
}
return(poly)
}
Method 3: Simplify the geometry of the polygon shapes
We can reduce the number of vertices in our polygon shapes using the gSimplify
function from the rgeos
package
Code:
can <- getData('GADM', country="CAN", level=0)
can <- gSimplify(can, tol=0.01, topologyPreserve=TRUE)
Some benchmarks:
I used elapsed system.time
to benchmark my plotting times. Note that these are just the times for plotting the countries, without the contour lines and other extra things. For the sp objects, I just used the plot
function. For the data frame objects, I used the plot
function with type='l'
and the lines
function.
Plotting the original Canada, USA, Mexico polygons:
73.009 seconds
Using Method 1:
2.449 seconds
Using Method 2:
17.660 seconds
Using Method 3:
16.695 seconds
Using Method 2 + 1:
1.729 seconds
Using Method 2 + 3:
0.445 seconds
Using Method 2 + 3 + 1:
0.172 seconds
Other remarks:
It seems that the combination of methods 2 + 3 gives sufficient speed ups to the plotting of polygons. Using methods 2 + 3 + 1 adds the problem of losing the nice properties of sp
objects, and my main difficulty is applying projections. I hacked something together to project a data frame object, but it runs rather slow. I think using method 2 + 3 provides sufficient speed ups for me until I can get the kinks out of using method 2 + 3 + 1.
Everyone should look into transfering over to sf (spatial features) package instead of sp. It is significantly faster (1/60th in this case) and easier to use. Here is an example of reading in a shp and plotting via ggplot2.
Note: You need to reinstall ggplot2 from the most recent build on github (see below)
library(rgdal)
library(sp)
library(sf)
library(plyr)
devtools::install_github("tidyverse/ggplot2")
library(ggplot2)
# Load the shape files
can<-getData('GADM', country="CAN", level=0)
td <- file.path(tempdir(), "rgdal_examples"); dir.create(td)
st_write(st_as_sf(can),file.path(td,'can.shp'))
ptm <- proc.time()
can = readOGR(dsn=td, layer="can")
can@data$id = rownames(can@data)
can.points = fortify(can, region="id")
can.df = join(can.points, can@data, by="id")
ggplot(can.df) + geom_polygon(aes(long,lat,group=group,fill='NAME_ENGLISH'))
proc.time() - ptm
user system elapsed
683.344 0.980 684.51
ptm <- proc.time()
can2 = st_read(file.path(td,'can.shp'))
ggplot(can2)+geom_sf( aes(fill = 'NAME_ENGLISH' ))
proc.time() - ptm
user system elapsed
11.340 0.096 11.433