Is `if` faster than ifelse?
This is more of an extended comment building on Roman's answer, but I need the code utilities to expound:
Roman is correct that if
is faster than ifelse
, but I am under the impression that the speed boost of if
isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, if
is only advantageous over ifelse
when the cond
/test
argument is of length 1.
Consider the following function which is an admittedly weak attempt at vectorizing if
without having the side effect of evaluating both the yes
and no
conditions as ifelse
does.
ifelse2 <- function(test, yes, no){
result <- rep(NA, length(test))
for (i in seq_along(test)){
result[i] <- `if`(test[i], yes[i], no[i])
}
result
}
ifelse2a <- function(test, yes, no){
sapply(seq_along(test),
function(i) `if`(test[i], yes[i], no[i]))
}
ifelse3 <- function(test, yes, no){
result <- rep(NA, length(test))
logic <- test
result[logic] <- yes[logic]
result[!logic] <- no[!logic]
result
}
set.seed(pi)
x <- rnorm(1000)
library(microbenchmark)
microbenchmark(
standard = ifelse(x < 0, x^2, x),
modified = ifelse2(x < 0, x^2, x),
modified_apply = ifelse2a(x < 0, x^2, x),
third = ifelse3(x < 0, x^2, x),
fourth = c(x, x^2)[1L + ( x < 0 )],
fourth_modified = c(x, x^2)[seq_along(x) + length(x) * (x < 0)]
)
Unit: microseconds
expr min lq mean median uq max neval cld
standard 52.198 56.011 97.54633 58.357 68.7675 1707.291 100 ab
modified 91.787 93.254 131.34023 94.133 98.3850 3601.967 100 b
modified_apply 645.146 653.797 718.20309 661.568 676.0840 3703.138 100 c
third 20.528 22.873 76.29753 25.513 27.4190 3294.350 100 ab
fourth 15.249 16.129 19.10237 16.715 20.9675 43.695 100 a
fourth_modified 19.061 19.941 22.66834 20.528 22.4335 40.468 100 a
SOME EDITS: Thanks to Frank and Richard Scriven for noticing my shortcomings.
As you can see, the process of breaking up the vector to be suitable to pass to if
is a time consuming process and ends up being slower than just running ifelse
(which is probably why no one has bothered to implement my solution).
If you're really desperate for an increase in speed, you can use the ifelse3
approach above. Or better yet, Frank's less obvious* but brilliant solution.
- by 'less obvious' I mean, it took me two seconds to realize what he did. And per nicola's comment below, please note that this works only when
yes
andno
have length 1, otherwise you'll want to stick withifelse3
if
is a primitive (complied) function called through the .Primitive
interface, while ifelse
is R bytecode, so it seems that if
will be faster. Running some quick benchmarks
> microbenchmark(`if`(TRUE, "a", "b"), ifelse(TRUE, "a", "b"))
Unit: nanoseconds
expr min lq mean median uq max neval cld
if (TRUE) "a" else "b" 46 54 372.59 60.0 68.0 30007 100 a
ifelse(TRUE, "a", "b") 1212 1327 1581.62 1442.5 1617.5 11743 100 b
> microbenchmark(`if`(FALSE, "a", "b"), ifelse(FALSE, "a", "b"))
Unit: nanoseconds
expr min lq mean median uq max neval cld
if (FALSE) "a" else "b" 47 55 91.64 61.5 73 2550 100 a
ifelse(FALSE, "a", "b") 1256 1346 1688.78 1460.0 1677 17260 100 b
It seems that if not taking into account the code that is in actual branches, if
is at least 20x faster than ifelse
. However, note that this doesn't account the complexity of expression being tested and possible optimizations on that.
Update: Please note that this quick benchmark represent a very simplified and somewhat biased use case of if
vs ifelse
(as pointed out in the comments). While it is correct, it underrepresents the ifelse
use cases, for that Benjamin's answer seems to provided more fair comparison.