Summing Multiple Groups of Columns
Always it is easier to work on data in long format. Hence, change class_df
to long format using tidyr:gather
and join with class_fg
. Perform analysis in long format on your data. Finally, spread in wide-format to match expected result.
library(tidyverse)
class_df %>% gather(key, value, -id) %>%
inner_join(class_fg, by=c("key" = "class")) %>%
group_by(id, fg) %>%
summarise(value = sum(value)) %>%
spread(fg, value) %>%
inner_join(class_df, by="id") %>% as.data.frame()
# id X Y Z A B C D E F
# 1 1 0.0 0.4 0.6 0.20 0.3 0.10 0.15 0.25 0.0
# 2 2 0.4 0.4 0.2 0.05 0.1 0.05 0.30 0.10 0.4
# 3 3 0.3 0.4 0.3 0.10 0.1 0.10 0.20 0.20 0.3
Data:
class_fg <- read.table(text =
"class fg
A Z
B Z
C Z
D Y
E Y
F X",
header = TRUE, stringsAsFactors = FALSE)
class_df <- read.table(text =
"id A B C D E F
1 0.20 0.30 0.10 0.15 0.25 0.00
2 0.05 0.10 0.05 0.30 0.10 0.40
3 0.10 0.10 0.10 0.20 0.20 0.30",
header = TRUE, stringsAsFactors = FALSE)
We could split
the 'class_df' by 'class', loop through the list
elements with map
, select
the columns of 'class_df' and get the rowSums
library(tidyverse)
class_fg %>%
split(.$fg) %>%
map_df(~ class_df %>%
select(one_of(.x$class)) %>%
rowSums) %>%
bind_cols(class_df, .)
# id A B C D E F X Y Z
#1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
Or do a group by nest
ing, and then do the rowSums
by map
ping over the list
elements
class_fg %>%
group_by(fg) %>%
nest %>%
mutate(out = map(data, ~ class_df %>%
select(one_of(.x$class)) %>%
rowSums)) %>%
select(-data) %>%
unnest %>%
unstack(., out ~ fg) %>%
bind_cols(class_df, .)