Read a Dendrogram

Charles Mortished_21141
Charles Mortished_21141
Altair Employee

Hierarchical clustering is a great tool for understanding how the behaviour of a model varies through a design space, for example deformation shapes of components in crash, as seen here.  One of the key features of hierarchical clustering is that, well, it has a hierarchy which can help you understand how the behaviour of different designs relate to each other. This can be shown in plot called a dendrogram, an example of which is shown below.

image

The dendrogram visualizes two key pieces of information, which points are grouped together and the similarity of members of group. Groups are shown through horizontal lines, the Y position of horizontal lines show the distance between the groups members. The value of this distance is a product of both the distance function and the linkage method used. 

There are many different linkage methods, but the one I find most intuitive is the “average” or UPGMA (Unweighted Pair Group Method with Arithmetic Mean) method. The “average” or UPGMA method calculates the distance between two groups as the average distance between each point of the first group with each point of the second group.

To understand how to read the dendrogram it is helpful to understand how the clustering is performed. Consider the following 5 points in 1D space for easy conceptualisation. Clustering these points using the average linkage method will produce the dendrogram shown above. 

Point Value
A 1
B 2
C 4
D 9
E 12


image



The first step in hierarchical clustering is to create a distance matrix between all points. In this case linear distance is used, but for problems with higher dimensions other distance functions used such as Euclidian distance or correlation would be needed. The points above give the following distance matrix,

image

The smallest distance is between points A & B, with a distance of 1. This means that the points A and B can be collapsed into the group AB.

The distance of this new group to the other points can be calculated as an average of the distances from point A and point B to each of the other points. Eg.

image

The next distance matrix now has the group AB in place of points A and B,

image


Here it can be seen that the distance between AB and C is the smallest, so they get grouped together into group ABC and the distances to points D and E are now calculated as an average of the distances to A, B & C. The equation now becomes,

image

Giving the following distance matrix,

image 

The distance between D and E is the smallest so they are now grouped into DE. The final distance between group ABC and DE is calculated as the average distance between D to points A B C and distance between E to A B C. This final distance is calculated from,

image

Giving the final distance matrix,

 image


Now the points have now been fully clustered and the relationship between each point can now be plotted on a dendrogram. For the average or UPGMA linkage method, the dendrogram can be read as follows. 

 image


Hierarchical clustering can power Expert Emulation, for examples on how this can be used to improve your automotive crash optimization process check out this blog post.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.