Collapsing rows by user with dplyr
The way that I would approach this would be to convert your data to long form first, then do the aggregation, and convert back out to wide form if necessary for display purposes.
So, using tidyr
,
df %>% gather(rating, count, -User) %>%
group_by(User, rating) %>%
summarise(count = max(count)) %>%
spread(rating, count)
The first gather converts to long form (using p
instead of +
):
> df <- read.table(header=TRUE, text='User p1 p2 p3 p4 p5
A 1 0 0 0 0
A 0 1 0 0 0
A 0 0 0 0 1
B 0 0 1 0 0
B 0 0 0 1 0
')
> df %>% gather(rating, count, -User)
User rating count
1 A p1 1
2 A p1 0
3 A p1 0
4 B p1 0
5 B p1 0
6 A p2 0
...
And the remaining steps perform the aggregation, then transform back to wide format.
Looks like you can use summarise_each
:
df %>% group_by(User) %>% summarise_all(funs(sum))
Edit note: replaced summarise_each
whicih is now deprecated with summarise_all
Here's alternatve dplyr
solution
df %>% group_by(User) %>% do(as.list(colSums(.)))
Or a data.table
possible implementation
library(data.table)
setDT(df)[, lapply(.SD, sum), User]
Or
setDT(df)[, as.list(colSums(.SD)), User]
Or with base R, even simpler
aggregate(. ~ User, df, sum)