Workflow: Clustering data with the K-Means Clustering block


The K-Means Clustering block enables you to apply a K-Means clustering model to a dataset.

The following demonstrates how the K-Means Clustering block is used to split the input dataset lib_books.csv (containing observations that describe a range of books available from a lending library) into a specified number of clusters and assign observations to them.

  1. Import the lib_books.csv dataset onto a Workflow canvas using the Text File Import block.
  2. Expand the Model Training group in the Workflow palette, then click and drag a K-Means Clustering block onto the Workflow canvas.
  3. Click the Output port of the lib_books dataset block and drag a connection towards the Input port of the K-Means Clustering block.
  4. Double-click the K-Means Clustering block to display the Configure K-Means Clustering dialog box.
  5. In the Configure K-Means Clustering dialog box:
    1. In the Unselected Variables list, press and hold CTRL and select the NumberInStock and Price variables.
    2. Click Select to move the specified variables to the Selected Variables list.
  6. Click OK to save the configuration and close the Configure K-Means Clustering dialog box.

A green execution status is displayed in the Output port of the K-Means Clustering block with the model results, K-Means Clustering Model. The K-Means Clustering block output can be used with a Score block to apply the clustering to a dataset.