Applying a function to every row of a table using dplyr?
The idiomatic approach will be to create an appropriately vectorised function.
R
provide pmax
which is suitable here, however it also provides Vectorize
as a wrapper for mapply
to allow you to create a vectorised arbitrary version of an arbitrary function.
library(dplyr)
# use base R pmax (vectorized in C)
iris %>% mutate(max.len = pmax(Sepal.Length, Petal.Length))
# use vectorize to create your own function
# for example, a horribly inefficient get first non-Na value function
# a version that is not vectorized
coalesce <- function(a,b) {r <- c(a[1],b[1]); r[!is.na(r)][1]}
# a vectorized version
Coalesce <- Vectorize(coalesce, vectorize.args = c('a','b'))
# some example data
df <- data.frame(a = c(1:5,NA,7:10), b = c(1:3,NA,NA,6,NA,10:8))
df %>% mutate(ab =Coalesce(a,b))
Note that implementing the vectorization in C / C++ will be faster, but there isn't a magicPony
package that will write the function for you.
As of dplyr 0.2 (I think) rowwise()
is implemented, so the answer to this problem becomes:
iris %>%
rowwise() %>%
mutate(Max.Len= max(Sepal.Length,Petal.Length))
Non rowwise
alternative
Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise
is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.
The most straightforward way I have found is based on one of Hadley's examples using pmap
:
iris %>%
mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))
Using this approach, you can give an arbitrary number of arguments to the function (.f
) inside pmap
.
pmap
is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).