randomized SVD singular values
Calculation
I reckon that your algorithm is a modification of the algorithm of Martinsson et al.. If I understood it correctly, this is especially meant for approximations for low rank matrices. I might be wrong though.
The difference is easily explained by the huge difference between the actual rank of A (500) and the values of k (10) and p (5). Plus, Martinsson et al mention that the value for p should actually be larger than the chosen value for k.
So if we apply your solution taking their considerations into account, using :
set.seed(10)
A <- matrix(rnorm(500*500),500,500) # rank 500
B <- matrix(rnorm(500*50),500,500) # rank 50
We find for the timings that the use of a larger p value still results in a huge speed-up compared to the original svd algorithm.
> system.time(t1 <- svd(A)$d[1:5])
user system elapsed
0.8 0.0 0.8
> system.time(t2 <- rsvd(A,10,5)$d[1:5])
user system elapsed
0.01 0.00 0.02
> system.time(t3 <- rsvd(A,10,30)$d[1:5])
user system elapsed
0.04 0.00 0.03
> system.time(t4 <- svd(B)$d[1:5] )
user system elapsed
0.55 0.00 0.55
> system.time(t5 <-rsvd(B,10,5)$d[1:5] )
user system elapsed
0.02 0.00 0.02
> system.time(t6 <-rsvd(B,10,30)$d[1:5] )
user system elapsed
0.05 0.00 0.05
> system.time(t7 <-rsvd(B,25,30)$d[1:5] )
user system elapsed
0.06 0.00 0.06
But we see that using a higher p for a lower rank matrix indeed gives a better approximation. If we let k also approach the rank a bit closer, the difference between the real solution and the approximation becomes appx. 0, while the speed gain is still substantial.
> round(mean(t2/t1),2)
[1] 0.77
> round(mean(t3/t1),2)
[1] 0.82
> round(mean(t5/t4),2)
[1] 0.92
> round(mean(t6/t4),2)
[1] 0.97
> round(mean(t7/t4),2)
[1] 1
So in general I believe that one could conclude that :
- p should be chosen so p > k (Martinsson calls it
l
if I'm right) - k shouldn't be too much different from rank(A)
- For low rank matrices the result is generally better.
Optimalization:
As far as I'm concerned, it's a neat way of doing it. I couldn't really find a more optimal way actually. The only thing I could say is that the construct t(q) %*% A
is advised against. One should use crossprod(q,A)
for that, which is supposed to be a tiny bit faster. But in your example the difference was nonexistent.
The paper by Halko, Martinsson and Tropp also recommends to do a couple of power iterations before computing the QR. We do 3 power iterations by default in the implementation in scikit-learn and we found it to work very well in practice.