How to create a pivot table in R with multiple (3+) variables
The request: "The values of c2 variable must be aggregated by sum for each level" can be decomposed into an aggregation step that use tapply
to create a 3-dimensional table object which can then be presented as flattened (with ftable
) with 2 dimensions on the rows and one dimension for columns:
ftable(with(mydata, tapply(c2, list(c1,c3,c4), sum) ) )
A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
E 124 NA 667812.84 NA NA
201 47619.32 NA NA NA
202 NA NA 116443.56 NA
203 73412.28 NA NA NA
L 124 NA 471730.20 NA 81146.25
201 11376.55 NA NA NA
202 NA NA NA NA
203 NA NA NA NA
This can also be easily produced by the pivottabler
package - using either the one-line quick-pivot function or the more verbose syntax:
df <- read.csv(text="c1,c2,c3,c4
E,5.76,201,A la vista
E,47530.71,201,A la vista
E,82.85,201,A la vista
L,11376.55,201,A la vista
E,6683.37,203,A la vista
E,66726.52,203,A la vista
E,2.39,203,A la vista
E,79066.07,202,Montoxv_a60d
E,14715.71,202,Montoxv_a60d
E,22661.78,202,Montoxv_a60d
L,81146.25,124,Montoxv_a90d
L,471730.2,124,Montoxv_a186d
E,667812.84,124,Montoxv_a186d", header=TRUE)
# quick pivot syntax
library(pivottabler)
qhpvt(df, c("c1","c3"), "c4", "sum(c2)", totals="NONE")
# verbose syntax
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df)
pt$addColumnDataGroups("c4", addTotal=FALSE)
pt$addRowDataGroups("c1", addTotal=FALSE)
pt$addRowDataGroups("c3", addTotal=FALSE)
pt$defineCalculation(calculationName="calc1", summariseExpression="sum(c2)")
pt$renderPivot()
Output:
More info about the pivottabler
package at:
http://pivottabler.org.uk/articles/v01-introduction.html
NB: I am the package author.
You can do this with dcast from the reshape2
package:
dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)
For example:
library(reshape2)
# reproducible version of your data
mydata = read.csv(text="c1,c2,c3,c4
E,5.76,201,A la vista
E,47530.71,201,A la vista
E,82.85,201,A la vista
L,11376.55,201,A la vista
E,6683.37,203,A la vista
E,66726.52,203,A la vista
E,2.39,203,A la vista
E,79066.07,202,Montoxv_a60d
E,14715.71,202,Montoxv_a60d
E,22661.78,202,Montoxv_a60d
L,81146.25,124,Montoxv_a90d
L,471730.2,124,Montoxv_a186d
E,667812.84,124,Montoxv_a186d", header=TRUE)
result = dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)
produces:
c1 c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
1 E 124 0.00 667812.8 0.0 0.00
2 E 201 47619.32 0.0 0.0 0.00
3 E 202 0.00 0.0 116443.6 0.00
4 E 203 73412.28 0.0 0.0 0.00
5 L 124 0.00 471730.2 0.0 81146.25
6 L 201 11376.55 0.0 0.0 0.00
Here are a few more options, two in base R and one using the more recent "dplyr" and "tidyr" packages.
Base R's reshape
can't handle aggregation, so you need to resort to other functions (for example, aggregate
) before you can do the reshaping.
reshape(
aggregate(c2 ~ c1 + c3 + c4, mydata, sum),
direction = "wide", idvar = c("c1", "c3"), timevar = "c4")
# c1 c3 c2.A la vista c2.Montoxv_a186d c2.Montoxv_a60d c2.Montoxv_a90d
# 1 E 201 47619.32 NA NA NA
# 2 L 201 11376.55 NA NA NA
# 3 E 203 73412.28 NA NA NA
# 4 E 124 NA 667812.8 NA NA
# 5 L 124 NA 471730.2 NA 81146.25
# 6 E 202 NA NA 116443.6 NA
If your aggregation only involves sum, you can also use xtabs
to do the aggregation. Since you have multiple values on the RHS of the formula, you'll end up with a multi-dimensional array
, but that can easily be coerced into a rectangular form by using ftable
(as was done by @BondedDust in his answer). Note that the output using ftable
is a little different from others in that it returns all combinations of grouping variables by default, even where there would otherwise be fully empty rows.
ftable(xtabs(c2 ~ c1 + c3 + c4, mydata))
# c4 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# c1 c3
# E 124 0.00 667812.84 0.00 0.00
# 201 47619.32 0.00 0.00 0.00
# 202 0.00 0.00 116443.56 0.00
# 203 73412.28 0.00 0.00 0.00
# L 124 0.00 471730.20 0.00 81146.25
# 201 11376.55 0.00 0.00 0.00
# 202 0.00 0.00 0.00 0.00
# 203 0.00 0.00 0.00 0.00
Finally, you can also use the functions in "tidyr" and "dplyr" which offer similar functionality to the tools in "reshape" and "reshape2", but with a slightly different "grammar".
library(tidyr)
library(dplyr)
mydata %>% ## The source dataset
group_by(c1, c3, c4) %>% ## Grouping variables
summarise(c2 = sum(c2)) %>% ## aggregation of the c2 column
ungroup() %>% ## spread doesn't seem to like groups
spread(c4, c2) ## spread makes the data wide
# Source: local data frame [6 x 6]
#
# c1 c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# 1 E 124 NA 667812.8 NA NA
# 2 E 201 47619.32 NA NA NA
# 3 E 202 NA NA 116443.6 NA
# 4 E 203 73412.28 NA NA NA
# 5 L 124 NA 471730.2 NA 81146.25
# 6 L 201 11376.55 NA NA NA