Fast way to read xlsx files into R
Here is a small benchmark test. Results: readxl::read_xlsx
on average about twice as fast as openxlsx::read.xlsx
across different number of rows (n
) and columns (p
) using standard settings.
options(scipen=999) # no scientific number format
nn <- c(1, 10, 100, 1000, 5000, 10000, 20000, 30000)
pp <- c(1, 5, 10, 20, 30, 40, 50)
# create some excel files
l <- list() # save results
tmp_dir <- tempdir()
for (n in nn) {
for (p in pp) {
name <-
cat("\n\tn:", n, "p:", p)
flush.console()
m <- matrix(rnorm(n*p), n, p)
file <- paste0(tmp_dir, "/n", n, "_p", p, ".xlsx")
# write
write.xlsx(m, file)
# read
elapsed <- system.time( x <- openxlsx::read.xlsx(file) )["elapsed"]
df <- data.frame(fun = "openxlsx::read.xlsx", n = n, p = p,
elapsed = elapsed, stringsAsFactors = F, row.names = NULL)
l <- append(l, list(df))
elapsed <- system.time( x <- readxl::read_xlsx(file) )["elapsed"]
df <- data.frame(fun = "readxl::read_xlsx", n = n, p = p,
elapsed = elapsed, stringsAsFactors = F, row.names = NULL)
l <- append(l, list(df))
}
}
# results
d <- do.call(rbind, l)
library(ggplot2)
ggplot(d, aes(n, elapsed, color= fun)) +
geom_line() + geom_point() +
facet_wrap( ~ paste("columns:", p)) +
xlab("Number of rows") +
ylab("Seconds")
To write an excel file, readxl
has a counterpart called writexl
. As far as what is the best package to read an excel file, I think the benchmark provided above is pretty good.
The only reason I would use xlsx
to write a package would be if I were to write many excel sheets in one .xlsx
file.