data.table alternative for dplyr case_when

1) If the conditions are mutually exclusive with a default if all conditions are false then this works:

library(data.table)
DT <- as.data.table(df) # df is from question

DT[, new := c("e", "c", "d")[1 +
                             1 * (a == "a" & b == "b") + 
                             2 * (a == "b" & b == "a")]
]

giving:

> DT
   a b new
1: a b   c
2: b a   d
3: a a   e

2) If the results of the conditions are numeric then it is even easier. For example suppose instead of c and d we want 10 and 17 with a default of 3. Then:

library(data.table)
DT <- as.data.table(df) # df is from question

DT[, new := 3 + 
            (10 - 3) * (a == "a" & b == "b") + 
            (17 - 3) * (a == "b" & b == "a")]

3) Note that adding a 1-liner is sufficient to implement this. It assumes that there is at least one TRUE leg for each row.

when <- function(...) names(match.call()[-1])[apply(cbind(...), 1, which.max)]

# test
DT[, new := when(c = a == 'a' & b == 'b', 
                 d = a == 'b' & b == 'a', 
                 e = TRUE)]

This is not really an answer, but a bit too long for a comment. If deemed inappropriate I'm happy to remove the post.

There exists an interesting post on RStudio Community that discusses options to use dplyr::case_when without the usual tidyverse dependencies.

To summarise, three alternatives seem to exist:

Stefan Fleck isolated case_when from dplyr and build a new package lest that depends only on base.
yonicd developed noplyr, which "provides basic dplyr and tidyr functionality without the tidyverse dependencies".
Bob Rudis (hrbrmstr) is the creator of freebase, a "A 'usethis'-like Package for Base R Pseudo-equivalents of 'tidyverse' Code", which might also be worth checking out.

If it is only case_when that you're after, I imagine lest might be an attractive & minimal option in combination with data.table.

Update [29 October 2019]

Tyson Barrett recently made the package tidyfast available (currently as version 0.1.0) on GitHub, which provides function "dt_case_when for dplyr::case_when() syntax with the speed of data.table::fifelse()".

Update [25 February 2020]

There is also dtplyr, authored by Lionel Henry and maintained by Hadley Wickham, which "provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.".

FYI, a more recent answer for those coming across this post 2019. data.table versions above 1.13.0 have the fcase function that can be used. Note that it is not a drop-in replacement for dplyr::case_when as the syntax is different, but will be a "native" data.table way of calculation.

# Lazy evaluation
x = 1:10
data.table::fcase(
    x < 5L, 1L,
    x >= 5L, 3L,
    x == 5L, stop("provided value is an unexpected one!")
)
# [1] 1 1 1 1 3 3 3 3 3 3

dplyr::case_when(
    x < 5L ~ 1L,
    x >= 5L ~ 3L,
    x == 5L ~ stop("provided value is an unexpected one!")
)
# Error in eval_tidy(pair$rhs, env = default_env) :
#  provided value is an unexpected one!

# Benchmark
x = sample(1:100, 3e7, replace = TRUE) # 114 MB
microbenchmark::microbenchmark(
dplyr::case_when(
  x < 10L ~ 0L,
  x < 20L ~ 10L,
  x < 30L ~ 20L,
  x < 40L ~ 30L,
  x < 50L ~ 40L,
  x < 60L ~ 50L,
  x > 60L ~ 60L
),
data.table::fcase(
  x < 10L, 0L,
  x < 20L, 10L,
  x < 30L, 20L,
  x < 40L, 30L,
  x < 50L, 40L,
  x < 60L, 50L,
  x > 60L, 60L
),
times = 5L,
unit = "s")
# Unit: seconds
#               expr   min    lq  mean   median    uq    max neval
# dplyr::case_when   11.57 11.71 12.22    11.82 12.00  14.02     5
# data.table::fcase   1.49  1.55  1.67     1.71  1.73   1.86     5

Source, data.table NEWS for 1.13.0, released (24 Jul 2020).

data.table alternative for dplyr case_when

Update [29 October 2019]

Update [25 February 2020]

Tags:

If Statement

R

Data.Table

Dplyr

Related

Recent Posts