How to speed up GLM estimation?

Assuming that your design matrix is not sparse, then you can also consider my package parglm. See this vignette for a comparison of computation times and further details. I show a comparison here of computation times on a related question.

One of the methods in the parglm function works as the bam function in mgcv. The method is described in detail in

Wood, S.N., Goude, Y. & Shaw S. (2015) Generalized additive models for large datasets. Journal of the Royal Statistical Society, Series C 64(1): 139-155.

On advantage of the method is that one can implement it with non-concurrent QR implementation and still do the computation in parallel. Another advantage is a potentially lower memory footprint. This is used in mgcv's bam function and could also be implemented here with a setup as in speedglm's shglm function.

There are a couple packages to speed up glm fitting. fastglm has benchmarks showing it to be even faster than speedglm.

You could also install a more performant BLAS library on your computer (as Ben Bolker suggests in comments), which will help any method.

Although a bit late but I can only encourage dickoa's suggestion to generate a sparse model matrix using the Matrix package and then feeding this to the speedglm.wfit function. That works great ;-) This way, I was able to run a logistic regression on a 1e6 x 3500 model matrix in less than 3 minutes.

How to speed up GLM estimation?

Tags:

Performance

R

Bigdata

Related

Recent Posts