Why are lubridate functions so slow when compared with as.POSIXct?
@Tyler's answer is correct. Here's some more info including a tip on making lubridate faster - from the help file:
" Lubridate has an inbuilt very fast POSIX parser, ported from the fasttime package by Simon Urbanek. This functionality is as yet optional and could be activated with options(lubridate.fasttime = TRUE). Lubridate will automatically detect POSIX strings and use fast parser instead of the default strptime utility. "
For the same reason cars are slow in comparison to riding on top of rockets. The added ease of use and safety make cars much slower than a rocket but you're less likely to get blown up and it's easier to start, steer, and brake a car. However, in the right situation (e.g., I need to get to the moon) the rocket is the right tool for the job. Now if someone invented a car with a rocket strapped to the roof we'd have something.
Start with looking at what dmy
is doing and you'll see the difference for the speed (by the way from your bechmarks I wouldn't say that lubridate
is that much slower as these are in milliseconds):
dmy
#type this into the command line and you get:
>dmy
function (..., quiet = FALSE, tz = "UTC")
{
dates <- unlist(list(...))
parse_date(num_to_date(dates), make_format("dmy"), quiet = quiet,
tz = tz)
}
<environment: namespace:lubridate>
Right away I see parse_date
and num_to_date
and make_format
. Makes one wonder what all these guys are. Let's see:
parse_date
> parse_date
function (x, formats, quiet = FALSE, seps = find_separator(x),
tz = "UTC")
{
fmt <- guess_format(head(x, 100), formats, seps, quiet)
parsed <- as.POSIXct(strptime(x, fmt, tz = tz))
if (length(x) > 2 & !quiet)
message("Using date format ", fmt, ".")
failed <- sum(is.na(parsed)) - sum(is.na(x))
if (failed > 0) {
message(failed, " failed to parse.")
}
parsed
}
<environment: namespace:lubridate>
num_to_date
> getAnywhere(num_to_date)
A single object matching ‘num_to_date’ was found
It was found in the following places
namespace:lubridate
with value
function (x)
{
if (is.numeric(x)) {
x <- as.character(x)
x <- paste(ifelse(nchar(x)%%2 == 1, "0", ""), x, sep = "")
}
x
}
<environment: namespace:lubridate>
make_format
> getAnywhere(make_format)
A single object matching ‘make_format’ was found
It was found in the following places
namespace:lubridate
with value
function (order)
{
order <- strsplit(order, "")[[1]]
formats <- list(d = "%d", m = c("%m", "%b"), y = c("%y",
"%Y"))[order]
grid <- expand.grid(formats, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
lapply(1:nrow(grid), function(i) unname(unlist(grid[i, ])))
}
<environment: namespace:lubridate>
Wow we got strsplit-ting
, expand-ing.grid-s
, paste-ing
, ifelse-ing
, unname-ing
etc. plus a Whole Lotta Error Checking Going On (play on the Zep song). So what we have here is some nice syntactic sugar. Mmmmm tasty but it comes with a price, speed.
Compare that to as.POSIXct
:
getAnywhere(as.POSIXct) #tells us to use methods to see the business
methods('as.POSIXct') #tells us all the business
as.POSIXct.date #what I believe your code is using (I don't use dates though)
There's a lot more Internal coding and less error checking going on with as.POSIXct
So you have to ask do I want ease and safety or speed and power? Depends on the job.