Get first value that matches condition (loop too slow)
Here is an attempt using match()
which reduces the time compared to the r = 30000
example in the original post by about 25%
.
sapply(m1[, 1] * 1.1, function(x) match(TRUE, m1[, 2] > x))
[1] 3 1 NA 3 1 6 3 2 1 2
There are some shortcuts you can take here. You are looking for the first value in column 2 that is higher than some other value. This means that it is never worth looking at values that are lower than what we have previously seen in column 2.
In your example with 10 rows, that would be as follows:
> cummax(m1[, 2])
[1] 1.393902 1.474218 1.891222 1.891222 1.891222 1.911469 1.911469 1.911469 1.911469 1.911469
> which(cummax(m1[, 2]) == m1[, 2])
[1] 1 2 3 6
And as you can see, these are the only values in your result vector.
A second optimisation that could be made is to order the first column. If you start looking for the lowest value first, and work your way up, you don't have to look through the second column each time. You only have to step to the next row there if there are no matches with the left row anymore.
This does bear the cost of sorting the matrix, but afterwards the result can be found using a single pass through both columns.
dostuff <- function(m1){
orderColumn1 <- order(m1[, 1])
plus.10 <- m1[, 1] * 1.1
results <- rep(NA, length(plus.10))
IndexColumn1 <- 1
IndexColumn2 <- 1
row2CurrentMax <- 0
while(IndexColumn2 <= nrow(m1)){
row2Current <- m1[IndexColumn2, 2]
if(row2Current > row2CurrentMax){
row2CurrentMax <- row2Current
while(TRUE){
row1Current <- plus.10[orderColumn1[IndexColumn1]]
if(row1Current <= row2CurrentMax){
results[orderColumn1[IndexColumn1]] <- IndexColumn2
IndexColumn1 <- IndexColumn1 + 1
} else {
break
}
}
}
IndexColumn2 <- IndexColumn2 + 1
}
results
}
With 30000 rows:
> result <- dostuff(m1)
> end_time <- Sys.time()
> a <- end_time - start_time
> a
Time difference of 0.0600059 secs
I don't imagine this is the fastest way but it will be somewhat faster than using the current for loop approach.
plus.10 <- m1[, 1] * 1.1
m2 <- m1[,2]
result <- sapply( plus.10, function(x) which.min(m2 < x))
result[plus.10 > max(m2) ] <- NA
result
[1] 3 1 NA 3 1 6 3 2 1 2
Edit: As requested by Ronak, microbenchmark
results of the solutions proposed so far on 10000 rows:
Unit: milliseconds
expr min lq mean median uq max neval cld
h1 335.342689 337.35915 361.320461 341.804840 347.856556 516.230972 25 b
sindri 672.587291 688.78673 758.445467 713.240778 811.298608 1049.109844 25 d
op 865.567412 884.99514 993.066179 1006.694036 1026.434344 1424.755409 25 e
loco 675.809092 682.98591 731.256313 693.672064 807.007358 821.893865 25 d
dmitry 420.869493 427.56492 454.439806 433.656519 438.367480 607.030825 25 c
jad 4.369628 4.41044 4.735393 4.503657 4.556527 7.488471 25 a