Summing rows by month in R
This could be another way to do this using data.table
library(data.table)
# Edited as per Arun's comment
out = setDT(data)[, lapply(.SD, sum), by=Date]
#>out
# Date Hour Melbourne Southern Flagstaff
#1: 2009-05-01 21 0 496 715
or by using dplyr
library(dplyr)
out = data %>% group_by(Date) %>% summarise_each(funs(sum))
#>out
#Source: local data frame [1 x 5]
# Date Hour Melbourne Southern Flagstaff
#1 2009-05-01 21 0 496 715
I create the data set by
data <- read.table( text=" Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610",
header=TRUE,stringsAsFactors=FALSE)
You can do the summation with the function aggregate
:
byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
data=data,FUN=sum)
Look at ?aggregate
to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:
FUN
is the function that should be used for the aggregation. I usesum
to sum up the values, but i could also bemean
,max
or some function you wrote yourself.data
is used to indicate that data frame that I want to aggregate.- The first argument tells the function what exactly I want to aggregate. On the left side of the
~
, I indicate the variables I want to aggregate. If there is more than one, they are combined withcbind
. On the right hand side is the variable by which the data should be split. PuttingDate
means that aggregate will sum up the variables for each distinct value ofDate
.
For the aggregation by month, I used the function month
from the package lubridate
. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate")
.
If you prefer not to use lubridate, you could do the following instead:
data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
data=data,FUN=sum)
Here I added a new column to data that contains the month and then aggregated by that column.
Another base R solution
# to sum by date
rowsum(dat[-1], dat$Date)
# Hour Melbourne Southern Flagstaff
#2009-05-01 21 0 496 715
# or by month and year
rowsum(dat[-1], format(dat$Date, "%b-%y") )
# Hour Melbourne Southern Flagstaff
#May-09 21 0 496 715