Pruning Decision Trees

Directly restricting the lowest value (number of occurences of a particular class) of a leaf cannot be done with min_impurity_decrease or any other built-in stopping criteria.

I think the only way you can accomplish this without changing the source code of scikit-learn is to post-prune your tree. To accomplish this, you can just traverse the tree and remove all children of the nodes with minimum class count less that 5 (or any other condition you can think of). I will continue your example:

from sklearn.tree._tree import TREE_LEAF

def prune_index(inner_tree, index, threshold):
    if inner_tree.value[index].min() < threshold:
        # turn node into a leaf by "unlinking" its children
        inner_tree.children_left[index] = TREE_LEAF
        inner_tree.children_right[index] = TREE_LEAF
    # if there are shildren, visit them as well
    if inner_tree.children_left[index] != TREE_LEAF:
        prune_index(inner_tree, inner_tree.children_left[index], threshold)
        prune_index(inner_tree, inner_tree.children_right[index], threshold)

print(sum(dt.tree_.children_left < 0))
# start pruning from the root
prune_index(dt.tree_, 0, 5)
sum(dt.tree_.children_left < 0)

this code will print first 74, and then 91. It means that the code has created 17 new leaf nodes (by practically removing links to their ancestors). The tree, which has looked before like

enter image description here

now looks like

enter image description here

so you can see that is indeed has decreased a lot.


Edit : This is not correct as @SBylemans and @Viktor point out in the comments. I'm not deleting the answer since someone else may also think this is the solution.

Set min_samples_leaf to 5.

min_samples_leaf :

The minimum number of samples required to be at a leaf node:

Update : I think it cannot be done with min_impurity_decrease. Think of the following scenario :

      11/9
   /         \
  6/4       5/5
 /   \     /   \
6/0  0/4  2/2  3/3

According to your rule, you do not want to split node 6/4 since 4 is less than 5 but you want to split 5/5 node. However, splitting 6/4 node has 0.48 information gain and splitting 5/5 has 0 information gain.