Calculating cumulative sum for each row

You can also try mySum = t(apply(df, 1, cumsum)).

The transpose is in there because the results come out transposed, for a reason I have not yet determined.

I'm sure there are fine solutions with plyr, such as ddply and multicore methods.


You want cumsum()

df <- within(df, acc_sum <- cumsum(count))

To replicate the OP's result, the cumsum function is all that is needed, as Chase's answer shows. However, the OP's wording "for each row" possibly indicates interest in the cumulative sums of a matrix or data frame.

For column-wise cumsums of a data.frame, interestingly, cumsum is again all one needs! cumsum is a primitive that is part of the Math group of generic functions, which is defined for data frames as applying the function to each column; inside the code, it just does this : x[] <- lapply(x, .Generic, ...).

> foo <- matrix(1:6, ncol=3)
> df <- data.frame(foo)
> df
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> cumsum(df)
  X1 X2 X3
1  1  3  5
2  3  7 11

Interestingly, sum is not part of Math, but part of the Summary group of generic functions; for data frames, this group first converts the data frame to a matrix and then calls the generic, so sum returns not column-wise sums but the overall sum:

> sum(df)
[1] 21

This discrepancy is (in my opinion) most likely because cumsum returns a matrix of the same size as the original, but sum would not.

For row-wise cumulative sums, there not a single function that replicates this behavior that I know of; Iterator's solution is probably one of the most straightforward.

If speed is an issue, it would be almost certainly be fastest and most foolproof to write it in C; however, it speeds up a little (~2x ?) for long loops by using a simple for loop.

rowCumSums <- function(x) {
  for(i in seq_len(dim(x)[1])) { x[i,] <- cumsum(x[i,]) }; x
}
colCumSums <- function(x) {
  for(i in seq_len(dim(x)[2])) { x[,i] <- cumsum(x[,i]) }; x
}

This can be sped up more by using the plain cumsum and subtracting off the sum so far when you get to the end of a column. For row cumulative sums, one needs to transpose twice.

colCumSums2 <- function(x) {
  matrix(cumsum(rbind(x,-colSums(x))), ncol=ncol(x))[1:nrow(x),]
}
rowCumSums2 <- function(x) {
  t(colCumSums2(t(x)))
}

That's really a hack though. Don't do it.

Tags:

R