Determining the most contributing features for SVM classifier in sklearn

In only one line of code:

fit an SVM model:

from sklearn import svm
svm = svm.SVC(gamma=0.001, C=100., kernel = 'linear')

and implement the plot as follows:

pd.Series(abs(svm.coef_[0]), index=features.columns).nlargest(10).plot(kind='barh')

The resuit will be:

the most contributing features of the SVM model in absolute values

If you're using rbf (Radial basis function) kernal, you can use sklearn.inspection.permutation_importance as follows to get feature importance. [doc]

from sklearn.inspection import permutation_importance
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

svc =  SVC(kernel='rbf', C=2)
svc.fit(X_train, y_train)

perm_importance = permutation_importance(svc, X_test, y_test)

feature_names = ['feature1', 'feature2', 'feature3', ...... ]
features = np.array(feature_names)

sorted_idx = perm_importance.importances_mean.argsort()
plt.barh(features[sorted_idx], perm_importance.importances_mean[sorted_idx])
plt.xlabel("Permutation Importance")

enter image description here

Yes, there is attribute coef_ for SVM classifier but it only works for SVM with linear kernel. For other kernels it is not possible because data are transformed by kernel method to another space, which is not related to input space, check the explanation.

from matplotlib import pyplot as plt
from sklearn import svm

def f_importances(coef, names):
    imp = coef
    imp,names = zip(*sorted(zip(imp,names)))
    plt.barh(range(len(names)), imp, align='center')
    plt.yticks(range(len(names)), names)
    plt.show()

features_names = ['input1', 'input2']
svm = svm.SVC(kernel='linear')
svm.fit(X, Y)
f_importances(svm.coef_, features_names)

And the output of the function looks like this: Feature importances

Determining the most contributing features for SVM classifier in sklearn

Tags:

Python

Machine Learning

Svm

Scikit Learn

Related

Recent Posts