Creating a contingency table using multiple columns in a data frame in R
One way using dplyr
would be:
library(dplyr)
df %>%
#group by the varialbe cl
group_by(cl) %>%
#sum every column
summarize_each(funs(sum)) %>%
#select the three needed columns
select(ab, bc, de) %>%
#transpose the df
t
Output:
[,1] [,2] [,3]
ab 1 3 2
bc 2 3 1
de 2 3 1
Your data is in a half-long half-wide format, and you want it in a fully wide format. This is easiest if we first covert it to a fully long format:
library(reshape2)
df_long = melt(df, id.vars = "cl")
head(df_long)
# cl variable value
# 1 1 ab 0
# 2 2 ab 1
# 3 3 ab 1
# 4 1 ab 1
# 5 2 ab 1
# 6 3 ab 0
Then we can turn it into a wide format, using sum
as the aggregating function:
dcast(df_long, variable ~ cl, fun.aggregate = sum)
# variable 1 2 3
# 1 ab 1 3 2
# 2 bc 2 3 1
# 3 de 2 3 1
In base
R:
t(sapply(data[,1:3],function(x) tapply(x,data[,4],sum)))
# 1 2 3
#ab 1 3 2
#bc 2 3 1
#de 2 3 1