How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

Here is one option with purrr. We get the unique prefix of the names of the dataset ('nm1'), use map (from purrr) to loop through the unique names, select the column that matches the prefix value of 'nm1', add the rows using reduce and the bind the columns (bind_cols) with the original dataset

Click to copy

library(tidyverse)
nm1 <- names(df) %>% 
          substr(1, 1) %>%
          unique 
nm1 %>% 
     map(~ df %>% 
            select(matches(.x)) %>%
            reduce(`+`)) %>%
            set_names(paste0("sum_", nm1)) %>%
     bind_cols(df, .)
#    a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1  1  4 10  9  3 15    10     7    25
#2  2  5 11 10  4 16    12     9    27
#3  3  6 12 11  5 17    14    11    29
#4  4  7 13 12  6 18    16    13    31
#5  5  8 14 13  7 19    18    15    33

Click to copy

df %>% 
  mutate(sum_a = pmap_dbl(select(., starts_with("a")), sum), 
         sum_b = pmap_dbl(select(., starts_with("b")), sum),
         sum_c = pmap_dbl(select(., starts_with("c")), sum))

  a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1  1  4 10  9  3 15    10     7    25
2  2  5 11 10  4 16    12     9    27
3  3  6 12 11  5 17    14    11    29
4  4  7 13 12  6 18    16    13    31
5  5  8 14 13  7 19    18    15    33

EDIT:

In the case there are many columns, and you wish to apply it programmatically:

Click to copy

row_sums <- function(x) {
  transmute(df, !! paste0("sum_", quo_name(x)) := pmap_dbl(select(df, starts_with(x)), sum))
}

newdf <- map_dfc(letters[1:3], row_sums)
newdf

  sum_a sum_b sum_c
1    10     7    25
2    12     9    27
3    14    11    29
4    16    13    31
5    18    15    33

And if needed you can tack on the original variables with:

Click to copy

bind_cols(df, dfnew)

  a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1  1  4 10  9  3 15    10     7    25
2  2  5 11 10  4 16    12     9    27
3  3  6 12 11  5 17    14    11    29
4  4  7 13 12  6 18    16    13    31
5  5  8 14 13  7 19    18    15    33

In case you like to consider a base R approach, here's how you could do it:

Click to copy

cbind(df, lapply(split.default(df, substr(names(df), 0,1)), rowSums))
#  a1 b1 c1 a2 b2 c2  a  b  c
#1  1  4 10  9  3 15 10  7 25
#2  2  5 11 10  4 16 12  9 27
#3  3  6 12 11  5 17 14 11 29
#4  4  7 13 12  6 18 16 13 31
#5  5  8 14 13  7 19 18 15 33

It splits the data column-wise into a list, based on the first letter of each column name (either a, b, or c).

If you have a large number of columns and need to differentiate between all characters except the numbers at the end of each column name, you could modify the approach to:

Click to copy

cbind(df, lapply(split.default(df, sub("\\d+$", "", names(df))), rowSums))

How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

Tags:

R

Dplyr

Mutate

Purrr

Related

Recent Posts