hierarchical clustering in code example
Example 1: what is hierarchical clustering
Hierarchical clustering, is an unsupervised learning algorithm that
groups similar objects into groups called clusters.
There are ways to create cluster hierarchy:-
1. Agglomerative (bottom up approach)
2. Divisive (top down approach)
Agglomerative clustering starts by treating each data point
as a separate cluster.
Then the algorithm finds out which 2 clusters are the most similar and
groups them to form a new cluster.
It repeats this process till all the clusters are merged to form a single
cluster.
The metric used to calculate the similarity between the clusters can be
the Manhattan or Euclidean distance.
The choice of the metric is completely arbitrary and there is no reason
to choose one over the other.
But the outcome might be different for each metric, so choose the one
that helps you gain more insight into your data.
The ways to combine clusters are calculating distance between the:-
1. Centroids of each clusters (Average of each cluster)
2. Closest points of each clusters (Single Linkage)
3. Farthest points of each clusters (Complete Linkage)
Divisive clustering is basically the opposite of Agglomerative.
You first have one huge cluster and then you split it into two most
unsimillar clsuters.
You repeat this process till you have one cluster for each data point.
Example 2: hierarchical clustering Agglomerative
data("USArrests")
df <- scale(USArrests)
head(df, nrow = 6)