How to pass a parameter to only one part of a pipeline object in scikit learn?
From the documentation:
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.
So you can simply insert model__
in front of whatever fit parameter kwargs you want to pass to your 'model'
step:
m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
You can also use the method set_params
and prepend the name of the step.
m = sklearn.pipeline.Pipeline([
('feature_selection', sklearn.feature_selection.SelectKBest(
score_func=sklearn.feature_selection.f_regression,
k=25)),
('model', sklearn.ensemble.RandomForestClassifier(
random_state=0,
oob_score=True,
n_estimators=500,
min_samples_leaf=5,
max_depth=10))])
m.set_params(model__sample_weight=np.array([3,4,2,3]))
Wish I could leave a comment on @rovyko post above instead of a separate answer but I don't have enough stackoverflow reputation yet to leave comments so here it is instead.
You cannot use:
Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])
to set parameters for the RandomForestClassifier.fit()
method. Pipeline.set_params()
as indicated in the code (here) is only for initialization parameters for individual steps in the Pipeline. RandomForestClassifier
has no initialization parameter called sample_weight
(see its __init__()
method here). sample_weight
is actually an input parameter to RandomForestClassifier
's fit()
method and can therefore only be set by the method presented in the correctly marked answer be @ali_m, which is,
m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))
.