Size of sample in Random Forest Regression

Uhh, I agree with you it's quite strange that we cannot specify the subsample/bootstrap size in RandomForestRegressor algo. Maybe a potential workaround is to use BaggingRegressor instead. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html#sklearn.ensemble.BaggingRegressor

RandomForestRegressor is just a special case of BaggingRegressor (use bootstraps to reduce the variance of a set of low-bias-high-variance estimators). In RandomForestRegressor, the base estimator is forced to be DeceisionTree, whereas in BaggingRegressor, you have the freedom to choose the base_estimator. More importantly, you can set your customized subsample size, for example max_samples=0.5 will draw random subsamples with size equal to half of the entire training set. Also, you can choose just a subset of features by setting max_features and bootstrap_features.

The sample size for bootstrap is always the number of samples.

You are not missing anything, the same question was asked on the mailing list for RandomForestClassifier:

The bootstrap sample size is always the same as the input sample size. If you feel up to it, a pull request updating the documentation would probably be quite welcome.

In the 0.22 version of scikit-learn, the max_samples option has been added, doing what you asked : here the documentation of the class.

Size of sample in Random Forest Regression

Tags:

Python

Machine Learning

Random Forest

Scikit Learn

Related

Recent Posts