sklearn: use Pipeline in a RandomizedSearchCV?

RandomizedSearchCV, as well as GridSearchCV, do support pipelines (in fact, they're independent of their implementation, and pipelines are designed to be equivalent to usual classifiers).

The key to the issue is pretty straightforward if you think, what parameters should search be done over. Since pipeline consists of many objects (several transformers + a classifier), one may want to find optimal parameters both for the classifier and transformers. Thus, you need to somehow distinguish where to get / set properties from / to.

So what you need to do is to say that you want to find a value for, say, not just some abstract gamma (which pipeline doesn't have at all), but gamma of pipeline's classifier, which is called in your case rbf_svm (that also justifies the need for names). This can be achieved using double underscore syntax, widely used in sklearn for nested models:

param_dist = {
          'rbf_svm__C': [1, 10, 100, 1000], 
          'rbf_svm__gamma': [0.001, 0.0001], 
          'rbf_svm__kernel': ['rbf', 'linear'],
}

I think this is what you need (section 3).

pipeline.get_params().keys() -> make sure your param grid keys match those returned by this.

sklearn: use Pipeline in a RandomizedSearchCV?

Tags:

Python

Machine Learning

Numpy

Scikit Learn

Related

Recent Posts