Performing dplyr mutate on subset of columns
tidyr::nest()
understands the same selector syntax as dplyr::select()
, so one approach would be to consolidate the columns of interest into a single column-of-dataframes, perform the necessary operations on that column-of-dataframes, and unnest to get back a flat data frame:
library( tidyverse )
dd %>% nest( X2, X4, .key="Slice" ) %>%
mutate( evensum = map(Slice, rowSums),
evenmean = map(Slice, rowMeans),
evensd = map(Slice, pmap_dbl, lift_vd(sd)) ) %>%
unnest
# id X1 X3 X5 evensum evenmean evensd X2 X4
# 1 a 0.602 0.687 0.447 0.876 0.438 0.100 0.367 0.509
# 2 b 0.195 0.831 0.965 1.70 0.848 0.200 0.989 0.707
# 3 c 0.966 0.105 0.141 1.68 0.839 0.0333 0.815 0.862
# 4 d 0.651 0.646 0.777 1.10 0.548 0.416 0.254 0.842
Since data frames are basically lists, this approach is naturally suited for applying arbitrary functions (such as sd
above) to arbitrary an set of columns using purrr::pmap()
family of functions.
Side note: Since sd
works on vectors, we use purrr::lift_vd
to convert its interface to be suitable for pmap
:
sd( c(0.367, 0.509) ) # 0.100
lift_vd(sd)( 0.367, .509 ) # 0.100
Am I missing something or would this work as expected:
cols <- paste0("X", c(2,4))
dd %>% mutate(evensum = rowSums(.[cols]), evenmean = rowMeans(.[cols]))
# id X1 X2 X3 X4 X5 evensum evenmean
#1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623 0.4380811
#2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878 0.8477439
#3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071 0.8387535
#4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535 0.5478768
Or are you specifically looking for a custom function to do this?
Not exactly what you are looking for but if you want to do it inside a pipe you could use select
explicitly inside mutate
like this:
dd %>% mutate(xy = select(., num_range("X", c(2,4))) %>% rowSums)
# id X1 X2 X3 X4 X5 xy
#1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623
#2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878
#3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071
#4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535
However, it is a bit more complicated if you want to apply several functions. You could use a helper function along the lines of (..not thoroughly tested.. ):
f <- function(x, ...) {
n <- nrow(x)
x <- lapply(list(...), function(y) if (length(y) == 1L) rep(y, n) else y)
matrix(unlist(x), nrow = n, byrow = FALSE)
}
And then apply it like this:
dd %>% mutate(xy = select(., num_range("X", c(2,4))) %>% f(., rowSums(.), max(.)))
# id X1 X2 X3 X4 X5 xy.1 xy.2
#1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623 0.9888592
#2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878 0.9888592
#3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071 0.9888592
#4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535 0.9888592
A number-of-columns agnostic approach using dplyr:
dd %>%
select(-id) %>%
mutate(evensum = rowSums(.[,1:length(.[1,])%%2==0]),
evenmean = rowMeans(.[,1:length(.[1,])%%2==0])) %>%
cbind(id=dd[,1],.)
id X1 X2 X3 X4 X5 evensum evenmean
1 a 0.6021140 0.3670719 0.6872308 0.5090904 0.4474437 0.8761623 0.4380812
2 b 0.1950439 0.9888592 0.8314290 0.7066286 0.9646670 1.6954878 0.8477439
3 c 0.9664587 0.8151934 0.1046694 0.8623137 0.1411871 1.6775071 0.8387535
4 d 0.6509055 0.2539684 0.6461509 0.8417851 0.7767125 1.0957535 0.5478767