Apply StandardScaler to parts of a data set
Introduced in v0.20 is ColumnTransformer which applies transformers to a specified set of columns of an array or pandas DataFrame.
import pandas as pd
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
col_names = ['Name', 'Age', 'Weight']
features = data[col_names]
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
ct = ColumnTransformer([
('somename', StandardScaler(), ['Age', 'Weight'])
], remainder='passthrough')
ct.fit_transform(features)
NB: Like Pipeline it also has a shorthand version make_column_transformer which doesn't require naming the transformers
Output
-1.41100443, 1.20270298, 3.
0.62304092, 0.04295368, 4.
0.78796352, -1.24565666, 6.
Update:
Currently the best way to handle this is to use ColumnTransformer as explained here.
First create a copy of your dataframe:
scaled_features = data.copy()
Don't include the Name column in the transformation:
col_names = ['Age', 'Weight']
features = scaled_features[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
Now, don't create a new dataframe but assign the result to those two columns:
scaled_features[col_names] = features
print(scaled_features)
Age Name Weight
0 -1.411004 3 1.202703
1 0.623041 4 0.042954
2 0.787964 6 -1.245657