Calculate row-wise maximum
I won't vouch for its speed, but this at least avoids coercing to a matrix:
data[,mymax:=do.call(pmax,.SD)]
use by=1:nrow(data)
to "group" by rows (which makes each line its own group):
data[, max_value:=max(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), by=1:nrow(data)]
data
Sepal.Length Sepal.Width Petal.Length Petal.Width max_value
1: 5.1 3.5 1.4 0.2 5.1
2: 4.9 3.0 1.4 0.2 4.9
3: 4.7 3.2 1.3 0.2 4.7
4: 4.6 3.1 1.5 0.2 4.6
5: 5.0 3.6 1.4 0.2 5.0
6: 5.4 3.9 1.7 0.4 5.4
Using dplyr
you could do:
library(dplyr)
setDF(data) %>%
rowwise() %>%
mutate(max = max(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width))
#Source: local data frame [10 x 5]
#Groups: <by row>
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width max
#1 5.1 3.5 1.4 0.2 5.1
#2 4.9 3.0 1.4 0.2 4.9
#3 4.7 3.2 1.3 0.2 4.7
#4 4.6 3.1 1.5 0.2 4.6
#5 5.0 3.6 1.4 0.2 5.0
#6 5.4 3.9 1.7 0.4 5.4
Or as @akrun suggested:
setDF(data) %>% mutate(max=pmax(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width))
Which is much faster than the rowwise()
approach:
n <- 10e6; nc <- 4; DT <- data.table(replicate(nc,rnorm(n)))
mbm <- microbenchmark(
steven = DT %>% rowwise() %>% mutate(V5 = max(V1, V2, V3, V4)),
frank = DT[,c(.SD,list(do.call(pmax,.SD)))],
akrun = DT %>% mutate(V5 = pmax(V1, V2, V3, V4)), times = 25, unit = "relative")
#Unit: relative
# expr min lq mean median uq max neval cld
# steven 17.93647 18.024734 17.535764 17.42948 17.484920 16.446384 25 b
# frank 1.00000 1.000000 1.000000 1.00000 1.000000 1.000000 25 a
# akrun 1.00220 1.002281 1.013604 1.00240 1.003089 1.001262 25 a