Aggregating hourly data into daily aggregates

There is also a nice package called hydroTSM. It uses zoo objects and can convert to other aggregates in time

The function in your case is subdaily2daily. You can choose if the aggregation should be based on min / max / mean...


A couple of options:

1. Timetk

If you have a data frame (or tibble) then the summarize_by_time() function from timetk can be used:

library(tidyverse)
library(timetk)

# Collect Data
text <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"

df <- read_csv(text, col_types = cols(Date = col_datetime("%m/%d/%Y %H:%M")))
df
#> # A tibble: 4 x 2
#>   Date                  DBT
#>   <dttm>              <dbl>
#> 1 2000-01-01 01:00:00    30
#> 2 2000-01-01 02:00:00    31
#> 3 2000-01-01 03:00:00    33
#> 4 2000-12-31 23:00:00    25

# Summarize
df %>%
  summarise_by_time(
    .date_var = Date, 
    .by       = "day",
    min       = min(DBT),
    max       = max(DBT),
    mean      = mean(DBT)
  )
#> # A tibble: 2 x 4
#>   Date                  min   max  mean
#>   <dttm>              <dbl> <dbl> <dbl>
#> 1 2000-01-01 00:00:00    30    33  31.3
#> 2 2000-12-31 00:00:00    25    25  25

Created on 2021-05-21 by the reprex package (v2.0.0)

2. Tidyquant

You can use the tidyquant package for this. The process is involves using the tq_transmute function to return a data frame that is modified using the xts aggregation function, apply.daily. We'll apply a custom stat_fun, which returns the min, max and mean. However, you can apply any vector function you'd like such as quantile.


library(tidyquant)

df
#> # A tibble: 4 x 2
#>                  Date   DBT
#>                <dttm> <dbl>
#> 1 2000-01-01 01:00:00    30
#> 2 2000-01-01 02:00:00    31
#> 3 2000-01-01 03:00:00    33
#> 4 2000-12-31 23:00:00    25

stat_fun <- function(x) c(min = min(x), max = max(x), mean = mean(x))

df %>%
    tq_transmute(select     = DBT,
                 mutate_fun = apply.daily,
                 FUN        = stat_fun)
# A tibble: 2 x 4
#>                 Date   min   max     mean
#>                <dttm> <dbl> <dbl>    <dbl>
#> 1 2000-01-01 03:00:00    30    33 31.33333
#> 2 2000-12-31 23:00:00    25    25 25.00000

1) This can be done compactly using zoo:

L <- "Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25"

library(zoo)
stat <- function(x) c(min = min(x), max = max(x), mean = mean(x))
z <- read.zoo(text = L, header = TRUE, sep = ",", format = "%m/%d/%Y", aggregate = stat)

This gives:

> z
           min max     mean
2000-01-01  30  33 31.33333
2000-12-31  25  25 25.00000

2) here is a solution that only uses core R:

DF <- read.csv(text = L)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
ag <- aggregate(DBT ~ Date, DF, stat) # same stat as in zoo solution 

The last line gives:

> ag
        Date  DBT.min  DBT.max DBT.mean
1 2000-01-01 30.00000 33.00000 31.33333
2 2000-12-31 25.00000 25.00000 25.00000

EDIT: (1) Since this first appeared the text= argument to read.zoo was added in the zoo package. (2) minor improvements.


Using strptime(), trunc() and ddply() from the plyr package :

#Make the data
ZZ <- textConnection("Date,DBT
01/01/2000 01:00,30
01/01/2000 02:00,31
01/01/2000 03:00,33
12/31/2000 23:00,25")
dataframe <- read.csv(ZZ,header=T)
close(ZZ)

# Do the calculations
dataframe$Date <- strptime(dataframe$Date,format="%m/%d/%Y %H:%M")
dataframe$day <- trunc(dataframe$Date,"day")

require(plyr)

ddply(dataframe,.(day),
      summarize,
      aveDBT=mean(DBT),
      maxDBT=max(DBT),
      minDBT=min(DBT)
)

gives

         day   aveDBT maxDBT minDBT
1 2000-01-01 31.33333     33     30
2 2000-12-31 25.00000     25     25

To clarify :

strptime converts the character to dates according to the format. To see how you can specify the format, see ?strptime. trunc will then truncate these date-times to the specified unit, which is day in this case.

ddply will evaluate the function summarize within the dataframe after splitting it up according to day. everything after summarize are arguments that are passed to the function summarize.