How to pass a parameter to only one part of a pipeline object in scikit learn?

From the documentation:

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.

So you can simply insert model__ in front of whatever fit parameter kwargs you want to pass to your 'model' step:

m.fit(X, y, model__sample_weight=np.array([3,4,2,3]))

You can also use the method set_params and prepend the name of the step.

m = sklearn.pipeline.Pipeline([
    ('feature_selection', sklearn.feature_selection.SelectKBest(
        score_func=sklearn.feature_selection.f_regression,
        k=25)),
    ('model', sklearn.ensemble.RandomForestClassifier(
        random_state=0, 
        oob_score=True, 
        n_estimators=500,
        min_samples_leaf=5, 
        max_depth=10))])

m.set_params(model__sample_weight=np.array([3,4,2,3]))

Wish I could leave a comment on @rovyko post above instead of a separate answer but I don't have enough stackoverflow reputation yet to leave comments so here it is instead.

You cannot use:

Pipeline.set_params(model__sample_weight=np.array([3,4,2,3])

to set parameters for the RandomForestClassifier.fit() method. Pipeline.set_params() as indicated in the code (here) is only for initialization parameters for individual steps in the Pipeline. RandomForestClassifier has no initialization parameter called sample_weight (see its __init__() method here). sample_weight is actually an input parameter to RandomForestClassifier's fit() method and can therefore only be set by the method presented in the correctly marked answer be @ali_m, which is,

m.fit(X, y, model__sample_weight=np.array([3,4,2,3])).

How to pass a parameter to only one part of a pipeline object in scikit learn?

Tags:

Python

Pandas

Pipeline

Scikit Learn

Related

Recent Posts