How to reduce a fully-connected (`"InnerProduct"`) layer using truncated SVD
Some linear-algebra background
Singular Value Decomposition (SVD) is a decomposition of any matrix W
into three matrices:
W = U S V*
Where U
and V
are ortho-normal matrices, and S
is diagonal with elements in decreasing magnitude on the diagonal.
One of the interesting properties of SVD is that it allows to easily approximate W
with a lower rank matrix: Suppose you truncate S
to have only its k
leading elements (instead of all elements on the diagonal) then
W_app = U S_trunc V*
is a rank k
approximation of W
.
Using SVD to approximate a fully connected layer
Suppose we have a model deploy_full.prototxt
with a fully connected layer
# ... some layers here
layer {
name: "fc_orig"
type: "InnerProduct"
bottom: "in"
top: "out"
inner_product_param {
num_output: 1000
# more params...
}
# some more...
}
# more layers...
Furthermore, we have trained_weights_full.caffemodel
- trained parameters for deploy_full.prototxt
model.
Copy
deploy_full.protoxt
todeploy_svd.protoxt
and open it in editor of your choice. Replace the fully connected layer with these two layers:layer { name: "fc_svd_U" type: "InnerProduct" bottom: "in" # same input top: "svd_interim" inner_product_param { num_output: 20 # approximate with k = 20 rank matrix bias_term: false # more params... } # some more... } # NO activation layer here! layer { name: "fc_svd_V" type: "InnerProduct" bottom: "svd_interim" top: "out" # same output inner_product_param { num_output: 1000 # original number of outputs # more params... } # some more... }
In python, a little net surgery:
import caffe import numpy as np orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) # get the original weight matrix W = np.array( orig_net.params['fc_orig'][0].data ) # SVD decomposition k = 20 # same as num_ouput of fc_svd_U U, s, V = np.linalg.svd(W) S = np.zeros((U.shape[0], k), dtype='f4') S[:k,:k] = s[:k] # taking only leading k singular values # assign weight to svd net svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S) svd_net.params['fc_svd_V'][0].data[...] = V[:k,:] svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias # save the new weights svd_net.save('trained_weights_svd.caffemodel')
Now we have deploy_svd.prototxt
with trained_weights_svd.caffemodel
that approximate the original net with far less multiplications, and weights.
Actually, Ross Girshick's py-faster-rcnn repo includes an implementation for the SVD step: compress_net.py
.
BTW, you usually need to fine-tune the compressed model to recover the accuracy (or to compress in a more sophisticated way, see for example "Accelerating Very Deep Convolutional Networks for Classification and Detection", Zhang et al).
Also, for me scipy.linalg.svd worked faster than numpy's svd.