Find dates that fail to parse in R Lubridate
Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.
For example:
library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")
[1] NA
Warning message:
1 failed to parse.
My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.
Credit to LawyeR and Stibu from above comments:
- I first sorted the raw csv column and did a head() & tail() to find which 3 dates were causing trouble
- Alternatively
which(is.na(dates$datetime))
was a simple one liner to also find the answer.
If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.
Make some dates, throw an error in there at a random location.
library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'),
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"
Now use a for loop and print each successful parse with stopifnot().
for(i in 1:length(my_dates)){
print(i)
stopifnot(!is.na(ymd(my_dates[i])))
}