Workflow: Clustering data with the Hierarchical Clustering block


The Hierarchical Clustering block enables you to apply a hierarchical clustering model to a dataset.

The following demonstrates how to use the Hierarchical Clustering to split the input basketball_players.csv dataset (containing observations that describe baskteball players in a national league) and assign observations to a specified number of clusters:

  1. Import the basketball_players.csv dataset onto a Workflow canvas using the Text File Import block.
  2. Expand the Model Training group in the Workflow palette, then click and drag a Hierarchical Clustering block onto the Workflow canvas.
  3. Click the Output port of the basketball_players dataset block and drag a connection towards the Input port of the Hierarchical Clustering block.
  4. Double-click the Hierarchical Clustering block to display the Clustering view along with the hierarchical clustering Preferences dialog box.
  5. In the hierarchical clustering Preferences dialog box:
    1. In the Unselected Variables list, press and hold CTRL and select the goals_scored, height, and weight variables.
    2. Click Select to move the specified variables to the Selected Independent Variables list.
  6. Click OK to save the configuration and close the hierarchical clustering Preferences dialog box.

    The Clustering view displays the clusters in the model.

  7. Close the Hierarchical Clustering view and save the configuration when prompted.

A green execution status is displayed in the Output port of the Hierarchical Clustering block and the new Working Dataset. The working dataset contains the input dataset with a new variable listing the cluster to which the observation is allocated.