What are the "standard unambiguous date" formats for string-to-date conversion in R?
This is documented behavior. From ?as.Date
:
format: A character string. If not specified, it will try '"%Y-%m-%d"' then '"%Y/%m/%d"' on the first non-'NA' element, and give an error if neither works.
as.Date("01 Jan 2000")
yields an error because the format isn't one of the two listed above. as.Date("01/01/2000")
yields an incorrect answer because the date isn't in one of the two formats listed above.
I take "standard unambiguous" to mean "ISO-8601" (even though as.Date
isn't that strict, as "%m/%d/%Y" isn't ISO-8601).
If you receive this error, the solution is to specify the format your date (or datetimes) are in, using the formats described in ?strptime
. Be sure to use particular care if your data contain day/month names and/or abbreviations, as the conversion will depend on your locale (see the examples in ?strptime
and read ?LC_TIME
).
In other words, is there a better solution than needing to specify the format?
Yes, there is now (ie in late 2016), thanks to anytime::anydate
from the anytime package.
See the following for some examples from above:
R> anydate(c("01 Jan 2000", "01/01/2000", "2015/10/10"))
[1] "2000-01-01" "2000-01-01" "2015-10-10"
R>
As you said, these are in fact unambiguous and should just work. And via anydate()
they do. Without a format.
As a complement to @JoshuaUlrich answer, here is the definition of function as.Date.character
:
as.Date.character
function (x, format = "", ...)
{
charToDate <- function(x) {
xx <- x[1L]
if (is.na(xx)) {
j <- 1L
while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
if (is.na(xx))
f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d",
tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d",
tz = "GMT")))
return(strptime(x, f))
stop("character string is not in a standard unambiguous format")
}
res <- if (missing(format))
charToDate(x)
else strptime(x, format, tz = "GMT")
as.Date(res)
}
<bytecode: 0x265b0ec>
<environment: namespace:base>
So basically if both strptime(x, format="%Y-%m-%d")
and strptime(x, format="%Y/%m/%d")
throws an NA
it is considered ambiguous and if not unambiguous.