summarize for all other values per group in dplyr
You can do:
df %>%
inner_join(df, by = c("group_id" = "group_id")) %>%
filter(person_id.x != person_id.y) %>%
group_by(group_id, person_id = person_id.x) %>%
summarise(decision = first(decision.x),
others_decison = sum(decision.y))
group_id person_id decision others_decison
<int> <int> <int> <int>
1 1 1 3 13
2 1 2 8 8
3 1 3 5 11
4 2 1 9 11
5 2 2 10 10
6 2 3 1 19
7 3 1 6 15
8 3 2 9 12
9 3 3 6 15
Depending on your actual dataset (its size), it may become computationally rather demanding as it involves an inner join.
Another possibility not involving an inner join could be:
df %>%
group_by(group_id) %>%
mutate(others_decison = list(decision),
rowid = 1:n()) %>%
ungroup() %>%
rowwise() %>%
mutate(others_decison = sum(unlist(others_decison)[-rowid])) %>%
ungroup() %>%
select(-rowid)
This can be accomplished fairly simply by creating a function that takes a function as an argument and removes each observation from the vector passed to it in turn.
library(dplyr)
my_summarise <- function(x, FUN, ...) {
sapply(seq_along(x), function(y)
FUN(x[-y], ...))
}
df %>%
group_by(group_id) %>%
mutate(dsum = my_summarise(decision, sum),
dmean = my_summarise(decision, mean),
dmax = my_summarise(decision, max))
# A tibble: 9 x 6
# Groups: group_id [3]
group_id person_id decision dsum dmean dmax
<int> <int> <int> <int> <dbl> <int>
1 1 1 3 13 6.5 8
2 1 2 8 8 4 5
3 1 3 5 11 5.5 8
4 2 1 9 11 5.5 10
5 2 2 10 10 5 9
6 2 3 1 19 9.5 10
7 3 1 6 15 7.5 9
8 3 2 9 12 6 6
9 3 3 6 15 7.5 9
Here are a few data.table methods:
library(data.table)
dt <- as.data.table(df)
# don't update original dt
dt[dt, on = .(group_id), allow.cartesian = T
][person_id != i.person_id,
.(decison = first(i.decision), others = sum(decision)),
by = .(group_id, person_id = i.person_id)]
#update the original dt way 1
dt[,
others_decision := .SD[.SD, on = .(group_id), allow.cartesian = T
][person_id != i.person_id, sum(decision), by = .(group_id,i.person_id)]$V1
]
#update the original dt way 2
dt1[,
others_decision := dt[group_id == .BY[[1]] & person_id != .BY[[2]], sum(decision)],
by = .(group_id, person_id)]
The first two main things are more-or-less @tmfmnk's approach but via data.table
. The last is more intuitive to me but is likely the slowest.