how to set threshold to scikit learn random forest model
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)
threshold = 0.4
predicted = random_forest.predict_proba(X_test)
predicted[:,0] = (predicted[:,0] < threshold).astype('int')
predicted[:,1] = (predicted[:,1] >= threshold).astype('int')
accuracy = accuracy_score(y_test, predicted)
print(round(accuracy,4,)*100, "%")
this comes with an error refers to the last accuracy part" ValueError: Can't handle mix of binary and multilabel-indicator"
Assuming you are doing binary classification, it's quite easy:
threshold = 0.4
predicted_proba = random_forest.predict_proba(X_test)
predicted = (predicted_proba [:,1] >= threshold).astype('int')
accuracy = accuracy_score(y_test, predicted)