What is the fastest way to calculate first two principal components in R?
I tried the pcaMethods package's implementation of the nipals algorithm. By default it calculates the first 2 principal components. Turns out to be slower than the other suggested methods.
set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
library(pcaMethods)
library(rbenchmark)
m1 <- pca(M, method="nipals", nPcs=2)
benchmark(pca(M, method="nipals"),
eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
test replications elapsed relative user.self sys.self
3 svd(M, 2, 0) 100 0.02 1.0 0.02 0
2 eigen(M) 100 0.03 1.5 0.03 0
4 prcomp(M) 100 0.03 1.5 0.03 0
5 princomp(M) 100 0.05 2.5 0.05 0
1 pca(M, method = "nipals") 100 0.23 11.5 0.24 0
You sometimes gets access to so-called 'economical' decompositions which allow you to cap the number of eigenvalues / eigenvectors. It looks like eigen()
and prcomp()
do not offer this, but svd()
allows you to specify the maximum number to compute.
On small matrices, the gains seem modest:
R> set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
R> library(rbenchmark)
R> benchmark(eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
test replications elapsed relative user.self sys.self user.child
2 svd(M, 2, 0) 100 0.021 1.00000 0.02 0 0
3 prcomp(M) 100 0.043 2.04762 0.04 0 0
1 eigen(M) 100 0.050 2.38095 0.05 0 0
4 princomp(M) 100 0.065 3.09524 0.06 0 0
R>
but the factor of three relative to princomp()
may be worth your while reconstructing princomp()
from svd()
as svd()
allows you to stop after two values.
The 'svd' package provides the routines for truncated SVD / eigendecomposition via Lanczos algorithm. You can use it to calculate just first two principal components.
Here I have:
> library(svd)
> set.seed(42); N <- 1000; M <- matrix(rnorm(N*N), N, N)
> system.time(svd(M, 2, 0))
user system elapsed
7.355 0.069 7.501
> system.time(princomp(M))
user system elapsed
5.985 0.055 6.085
> system.time(prcomp(M))
user system elapsed
9.267 0.060 9.368
> system.time(trlan.svd(M, neig = 2))
user system elapsed
0.606 0.004 0.614
> system.time(trlan.svd(M, neig = 20))
user system elapsed
1.894 0.009 1.910
> system.time(propack.svd(M, neig = 20))
user system elapsed
1.072 0.011 1.087