Is there a function like switch which works inside of dplyr::mutate?
Eons too late for the OP, but in case this shows up in a search ...
dplyr v0.5 has recode()
, a vectorized version of switch()
, so
data_frame(
x = sample(1:4, 10, replace=TRUE),
y1 = rnorm(n=10, mean=7, sd=2),
y2 = rnorm(n=10, mean=5, sd=2),
y3 = rnorm(n=10, mean=7, sd=1),
y4 = rnorm(n=10, mean=5, sd=1)
) %>%
mutate(y = recode(x,y1,y2,y3,y4))
produces, as anticipated:
# A tibble: 10 x 6
x y1 y2 y3 y4 y
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 6.950106 6.986780 7.826778 6.317968 6.986780
2 1 5.776381 7.706869 7.982543 5.048649 5.776381
3 2 7.315477 2.213855 6.079149 6.070598 2.213855
4 3 7.461220 5.100436 7.085912 4.440829 7.085912
5 3 5.780493 4.562824 8.311047 5.612913 8.311047
6 3 5.373197 7.657016 7.049352 4.470906 7.049352
7 2 6.604175 9.905151 8.359549 6.430572 9.905151
8 3 11.363914 4.721148 7.670825 5.317243 7.670825
9 3 10.123626 7.140874 6.718351 5.508875 6.718351
10 4 5.407502 4.650987 5.845482 4.797659 4.797659
(Also works with named args, including character and factor x's.)
Do the operation by each value of x
. This is the data.table
version, I assume smth similar can be done in dplyr
:
library(data.table)
dt = data.table(x = c(1,1,2,2), a = 1:4, b = 4:7)
dt[, newcol := switch(as.character(x), '1' = a, '2' = b, NA), by = x]
dt
# x a b newcol
#1: 1 1 4 1
#2: 1 2 5 2
#3: 2 3 6 6
#4: 2 4 7 7
You can now use dplyr
's function case_when
with mutate()
.
To follow your example in generating the data:
library(dplyr)
df.faithful <- tbl_df(faithful)
df.faithful$x <- sample(1:4, 272, rep=TRUE)
df.faithful$y1 <- rnorm(n=272, mean=7, sd=2)
df.faithful$y2 <- rnorm(n=272, mean=5, sd=2)
df.faithful$y3 <- rnorm(n=272, mean=7, sd=1)
df.faithful$y4 <- rnorm(n=272, mean=5, sd=1)
Now we define a new pick()
function using case_when
:
pick2 <- function(x, v1, v2, v3, v4) {
out = case_when(
x == 1 ~ v1,
x == 2 ~ v2,
x == 3 ~ v3,
x == 4 ~ v4
)
return(out)
}
And you see you can perfectly use it within mutate()
:
df.faithful %>%
mutate(y = pick2(x, y1, y2, y3, y4))
And the output is:
# A tibble: 272 x 8
eruptions waiting x y1 y2 y3 y4 y
<dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3.6 79 3 8.73 7.23 8.89 4.04 8.89
2 1.8 54 3 9.97 4.31 7.06 5.05 7.06
3 3.33 74 1 6.65 7.23 4.46 6.49 6.65
4 2.28 62 1 6.40 4.39 5.41 3.49 6.40
5 4.53 85 4 3.96 8.85 7.43 6.51 6.51
6 2.88 55 4 6.36 8.08 5.82 5.06 5.06
7 4.7 88 1 5.91 6.47 6.43 5.88 5.91
8 3.6 85 1 7.77 4.55 6.56 5.05 7.77
9 1.95 51 4 5.74 6.46 6.95 4.26 4.26
10 4.35 85 1 7.04 1.73 5.71 2.53 7.04
# ... with 262 more rows