Sklearn kmeans equivalent of elbow method
you can use the inertia attribute of Kmeans class.
Assuming X is your dataset:
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
X = # <your_data>
distorsions = []
for k in range(2, 20):
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
distorsions.append(kmeans.inertia_)
fig = plt.figure(figsize=(15, 5))
plt.plot(range(2, 20), distorsions)
plt.grid(True)
plt.title('Elbow curve')
You can also use euclidean distance between the each data with the cluster center distance to evaluate how many clusters to choose. Here is the code example.
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
iris = load_iris()
x = iris.data
res = list()
n_cluster = range(2,20)
for n in n_cluster:
kmeans = KMeans(n_clusters=n)
kmeans.fit(x)
res.append(np.average(np.min(cdist(x, kmeans.cluster_centers_, 'euclidean'), axis=1)))
plt.plot(n_cluster, res)
plt.title('elbow curve')
plt.show()
You had some syntax problems in the code. They should be fixed now:
Ks = range(1, 10)
km = [KMeans(n_clusters=i) for i in Ks]
score = [km[i].fit(my_matrix).score(my_matrix) for i in range(len(km))]
The fit
method just returns a self
object. In this line in the original code
cluster_array = [km[i].fit(my_matrix)]
the cluster_array
would end up having the same contents as km
.
You can use the score
method to get the estimate for how well the clustering fits. To see the score for each cluster simply run plot(Ks, score)
.