Hierarchical clustering contains two types of methods:
-
Agglomerative: bottom-up approach (from leaves to root)
-
Divisive: top-to-bottom (from root to leaves)
Agglomerative clustering
Agglomerative clustering is a sequential process:
-
First, all data points are considered as separate clusters.
-
Then, step by step, clusters that are closed to each other merge.
The first two steps of an example are displayed below:

The whole merging process can be vizualized using a dendrogram where the y-axis is the distance between two points/clusters. In x-axis we see the data point index.

Note: In case of a large sample, it is possible to display only the last steps of the algorithms (see level parameter in scipy)
There are three main parameters to choose:
-
Distance metric: this is the metric used to compute the distance between two data points.
-
Distance function / rules: this is the function used to compute distance between two clusters. It’s a function of the distance metric.
-
Number of clusters
Distance functions
The most common distance functions are:
Minimum distance (also called single-linkage-clustering):
\[D(A,B) = min\{d(x,y): x \in A, y \in B\}\]Average distance:
\[D(A,B) = \frac{1}{|A| |B|} \sum_{x \in A,~y \in B}d(x,y)\]Max distance (also called complete-linkage-clustering):
\[D(A,B) = max\{d(x,y): x \in A, y \in B\}\]
Number of clusters
To select the number of clusters, we can look at the dendrogram: we focus on groups that are linked by long lines. Since the lines represent the distance between data points or clusters, a group of clusters that are far away from each other will be linked with long lines.
In the below dendogram, it’s quite clear that 3 clusters are well separated:

On the below example (with associated dataset), we could select 3 or 5 clusters.

