Understanding max_features parameter in RandomForestRegressor
@lynnyi, max_features is the number of features that are considered on a per-split level, rather than on the entire decision tree construction. More clear, during the construction of each decision tree, RF will still use all the features (n_features), but it only consider number of "max_features" features for node splitting. And the "max_features" features are randomly selected from the entire features. You could confirm this by plotting one decision tree from a RF with max_features=1, and check all the nodes of that tree to count the number of features involved.
Straight from the documentation:
[
max_features
] is the size of the random subsets of features to consider when splitting a node.
So max_features
is what you call m. When max_features="auto"
, m = p and no feature subset selection is performed in the trees, so the "random forest" is actually a bagged ensemble of ordinary regression trees. The docs go on to say that
Empirical good default values are
max_features=n_features
for regression problems, andmax_features=sqrt(n_features)
for classification tasks
By setting max_features
differently, you'll get a "true" random forest.