How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
Here is one option with purrr
. We get the unique
prefix of the names
of the dataset ('nm1'), use map
(from purrr
) to loop through the unique names, select
the column that matches
the prefix value of 'nm1', add the rows using reduce
and the bind the columns (bind_cols
) with the original dataset
library(tidyverse)
nm1 <- names(df) %>%
substr(1, 1) %>%
unique
nm1 %>%
map(~ df %>%
select(matches(.x)) %>%
reduce(`+`)) %>%
set_names(paste0("sum_", nm1)) %>%
bind_cols(df, .)
# a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1 1 4 10 9 3 15 10 7 25
#2 2 5 11 10 4 16 12 9 27
#3 3 6 12 11 5 17 14 11 29
#4 4 7 13 12 6 18 16 13 31
#5 5 8 14 13 7 19 18 15 33
df %>%
mutate(sum_a = pmap_dbl(select(., starts_with("a")), sum),
sum_b = pmap_dbl(select(., starts_with("b")), sum),
sum_c = pmap_dbl(select(., starts_with("c")), sum))
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 1 4 10 9 3 15 10 7 25
2 2 5 11 10 4 16 12 9 27
3 3 6 12 11 5 17 14 11 29
4 4 7 13 12 6 18 16 13 31
5 5 8 14 13 7 19 18 15 33
EDIT:
In the case there are many columns, and you wish to apply it programmatically:
row_sums <- function(x) {
transmute(df, !! paste0("sum_", quo_name(x)) := pmap_dbl(select(df, starts_with(x)), sum))
}
newdf <- map_dfc(letters[1:3], row_sums)
newdf
sum_a sum_b sum_c
1 10 7 25
2 12 9 27
3 14 11 29
4 16 13 31
5 18 15 33
And if needed you can tack on the original variables with:
bind_cols(df, dfnew)
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 1 4 10 9 3 15 10 7 25
2 2 5 11 10 4 16 12 9 27
3 3 6 12 11 5 17 14 11 29
4 4 7 13 12 6 18 16 13 31
5 5 8 14 13 7 19 18 15 33
In case you like to consider a base R approach, here's how you could do it:
cbind(df, lapply(split.default(df, substr(names(df), 0,1)), rowSums))
# a1 b1 c1 a2 b2 c2 a b c
#1 1 4 10 9 3 15 10 7 25
#2 2 5 11 10 4 16 12 9 27
#3 3 6 12 11 5 17 14 11 29
#4 4 7 13 12 6 18 16 13 31
#5 5 8 14 13 7 19 18 15 33
It splits the data column-wise into a list, based on the first letter of each column name (either a, b, or c).
If you have a large number of columns and need to differentiate between all characters except the numbers at the end of each column name, you could modify the approach to:
cbind(df, lapply(split.default(df, sub("\\d+$", "", names(df))), rowSums))