Is there easy way to grid search without cross validation in python?

I would really advise against using OOB to evaluate a model, but it is useful to know how to run a grid search outside of GridSearchCV() (I frequently do this so I can save the CV predictions from the best grid for easy model stacking). I think the easiest way is to create your grid of parameters via ParameterGrid() and then just loop through every set of params. For example assuming you have a grid dict, named "grid", and RF model object, named "rf", then you can do something like this:

for g in ParameterGrid(grid):
    rf.set_params(**g)
    rf.fit(X,y)
    # save if best
    if rf.oob_score_ > best_score:
        best_score = rf.oob_score_
        best_grid = g

print "OOB: %0.5f" % best_score 
print "Grid:", best_grid

Although the question has been solved years ago, I just found a more natural way if you insist on using GridSearchCV() instead of other means (ParameterGrid(), etc.):

Create a sklearn.model_selection.PredefinedSplit(). It takes a parameter called test_fold, which is a list and has the same size as your input data. In the list, you set all samples belonging to training set as -1 and others as 0.
Create a GridSearchCV object with cv="the created PredefinedSplit object".

Then, GridSearchCV will generate only 1 train-validation split, which is defined in test_fold.

One method is to use ParameterGrid to make a iterator of the parameters you want and loop over it.

Another thing you could do is actually configure the GridSearchCV to do what you want. I wouldn't recommend this much because it's unnecessarily complicated.
What you would need to do is:

Use the arg cv from the docs and give it a generator which yields a tuple with all indices (so that train and test are same)
Change the scoring arg to use the oob given out from the Random forest.

Is there easy way to grid search without cross validation in python?

Tags:

Python

Random Forest

Scikit Learn

Grid Search

Related

Recent Posts