How to convert variable with mixed date formats to one format?
Here is a base solution:
fmts <- c("%d-%b-%y", "%d %b %Y", "%d-%m-%Y", "%m/%d/%y")
d <- as.Date(as.numeric(apply(outer(DF$date, fmts, as.Date), 1, na.omit)), "1970-01-01")
We have made the simplifying assumption that exactly 1 format works for each input date. That seems to be the case in the example but if not replace na.omit
with function(x) c(na.omit(x), NA)[1])
.
Note that a two digit year can be ambiguous but here it seems it should always be in the past so we subtract 100 years if not:
past <- function(x) ifelse(x > Sys.Date(), seq(from=x, length=2, by="-100 year")[2], x)
as.Date(sapply(d, past), "1970-01-01")
For the sample data the last line gives:
[1] "1987-02-25" "1974-08-20" "1984-10-09" "1992-08-18" "1995-09-19"
[6] "1963-10-16" "1965-09-30" "2008-01-22" "1961-11-13" "1987-08-18"
[11] "1970-09-15" "1994-10-05" "1984-12-05" "1987-03-23" "1988-08-30"
[16] "1993-10-26" "1989-08-22" "1997-09-13"
You may try parse_date_time
in package lubridate
which "allows the user to specify several format-orders to handle heterogeneous date-time character representations" using the orders
argument. Something like...
library(lubridate)
parse_date_time(x = df$date,
orders = c("d m y", "d B Y", "m/d/y"),
locale = "eng")
...should be able to handle most of your formats. Please note that b
/B
formats are locale
sensitive.
Other date-time formats which can be used in orders
are listed in the Details section in ?strptime
.