ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score
The error is produced because you have a loop for different number of clusters n
. During the first iteration, n_clusters
is 1
and this leads to all(km.labels_ == 0)
to be True
.
In other words, you have only one cluster with label 0 (thus, np.unique(km.labels_)
prints array([0], dtype=int32)
).
silhouette_score
requires more than 1 cluster labels. This causes the error. The error message is clear.
Example:
from sklearn import datasets
from sklearn.cluster import KMeans
import numpy as np
iris = datasets.load_iris()
X = iris.data
y = iris.target
km = KMeans(n_clusters=3)
km.fit(X,y)
# check how many unique labels do you have
np.unique(km.labels_)
#array([0, 1, 2], dtype=int32)
We have 3 different clusters/cluster labels.
silhouette_score(X, km.labels_, metric='euclidean')
0.38788915189699597
The function works fine.
Now, let's cause the error:
km2 = KMeans(n_clusters=1)
km2.fit(X,y)
silhouette_score(X, km2.labels_, metric='euclidean')
ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive)
From the documentation,
Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1
So one way to solve this problem is instead of using for k in range(1,15)
, try to start iteration from k = 2, which is for k in range(2,15)
. That works for me.