Split date rows by new year
Here's a tidyverse based solution. It's similar to Lennyy's, but with fewer condition checks, and there's no issue with times being added (they might show up in a tibble, but as 00:00:00
). I've added ungroup()
because it sounds like you have a grouping variable somewhere (comment under Lennyy's solution). It can be removed if you don't:
library(dplyr)
library(lubridate)
library(purrr)
test %>%
ungroup() %>% # This isn't necessary if there are no groupings.
split(rownames(test)) %>%
map_dfr(function(df){
if (year(df$from_date) == year(df$to_date)) return(df)
bind_rows(mutate(df, to_date = rollback(floor_date(to_date, "y"))),
mutate(df, from_date = floor_date(to_date, "y"))
)
}
)
#### OUTPUT ####
ID Disease Pass Payment from_date to_date
1 10 P US 110 2008-01-09 2008-12-31
2 10 P US 110 2009-01-01 2009-01-08
3 10 P US 110 2009-01-09 2009-12-31
4 10 P US 110 2010-01-01 2010-01-08
5 10 P US 115 2010-01-09 2010-12-31
6 10 P US 115 2011-01-01 2011-01-08
7 12 D EN 240 2008-01-01 2008-12-31
8 12 P EN 255 2013-12-31 2013-12-31
9 12 P EN 255 2014-01-01 2014-12-30
To explain: The dataframe is split into a list of rows. I then use map_dfr
to run the function on each dataframe where from_date
and to_date
contain different years. map_dfr
also binds the resulting dataframes together. Within the anonymous function I floor to_date
by year, and then I either roll it back to the last day of the previous month for the new to_date
in the first row, or leave it as it is for the new from_date
in the second row.
Using from_date and to_date we can create a date sequence using seq.Date
then split this sequence by year, finally select min and max of each year. Then use apply
, separate_rows
and separate
to get the final result.
cr_date <- function(d1, d2){
#browser()
sequence_date <- seq.Date(as.Date(d1), as.Date(d2), by='day')
lst_dates <- lapply(split(sequence_date, lubridate::year(sequence_date)),
function(x) paste0(min(x), '|', max(x)))
result <- paste0(lst_dates, collapse = ';')
return(result)
}
#Test
#cr_date(as.Date('2008-01-09'),as.Date('2009-01-08'))
test$flag <- apply(test, 1, function(x) cr_date(x['from_date'], x['to_date']))
library(tidyr)
separate_rows(test, flag, sep=';') %>%
separate(flag, into = c('from_date_new','to_date_new'), '\\|') %>%
mutate_at(vars('from_date_new','to_date_new'), list(~as.Date(.)))
ID Disease Pass Payment from_date to_date from_date_new to_date_new
1 10 P US 110 2008-01-09 2009-01-08 2008-01-09 2008-12-31
2 10 P US 110 2008-01-09 2009-01-08 2009-01-01 2009-01-08
3 10 P US 110 2009-01-09 2010-01-08 2009-01-09 2009-12-31
4 10 P US 110 2009-01-09 2010-01-08 2010-01-01 2010-01-08
5 10 P US 115 2010-01-09 2011-01-08 2010-01-09 2010-12-31
6 10 P US 115 2010-01-09 2011-01-08 2011-01-01 2011-01-08
7 12 D EN 240 2008-01-01 2008-12-31 2008-01-01 2008-12-31
8 12 P EN 255 2013-12-31 2014-12-30 2013-12-31 2013-12-31
9 12 P EN 255 2013-12-31 2014-12-30 2014-01-01 2014-12-30
This uses only base R.
First note that only dates with no times are used so we should be using Date
class, not POSIXct
. The latter can needlessly introduce timezone errors unless you are very careful so in the Note at the end which shows the input used we assume that we are starting out with test2
which contains Date
class data. The code in the Note also shows how to convert it to Date
class if it it already POSIXct
.
Given test2
we add from_year
, to_year
and eoy
(date at the end of the year) columns giving test3
. Then we iterate over the rows and if the years are the same return the row and if not return the split rows. This gives a list of one and two row data frames which we rbind
together.
test3 <- transform(test2,
from_year = format(from_date, "%Y"),
to_year = format(to_date, "%Y"),
eoy = as.Date(sub("-.*", "-12-31", from_date)))
nr <- nrow(test2)
do.call("rbind", lapply(1:nr, function(i) with(test3[i, ],
if (from_year == to_year) test2[i, ]
else data.frame(ID, Disease, Pass, Payment,
from_date = c(from_date, eoy+1),
to_date = c(eoy, to_date)))
))
Note
Assumed input in reproducible form. As noted above it uses Date
class.
test2 <- transform(test,
from_date = as.Date(from_date),
to_date = as.Date(to_date))