How to store scaling parameters for later use

I think that the best way is to pickle it post fit, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this.

Having said that, you can query sklearn.preprocessing.StandardScaler for the fit parameters:

scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. New in version 0.17: scale_ is recommended instead of deprecated std_. mean_ : array of floats with shape [n_features] The mean value for each feature in the training set.

The following short snippet illustrates this:

from sklearn import preprocessing
import numpy as np

s = preprocessing.StandardScaler()
s.fit(np.array([[1., 2, 3, 4]]).T)
>>> s.mean_, s.scale_
(array([ 2.5]), array([ 1.11803399]))

Scale with standard scaler

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(data)
scaled_data = scaler.transform(data)

save mean_ and var_ for later use

means = scaler.mean_ 
vars = scaler.var_

(you can print and copy paste means and vars or save to disk with np.save....)

Later use of saved parameters

def scale_data(array,means=means,stds=vars **0.5):
    return (array-means)/stds

scale_new_data = scale_data(new_data)

How to store scaling parameters for later use

Tags:

Python

Normalization

Standardized

Scikit Learn

Related

Recent Posts