Workflow: Grouping observations using the Binning block


The Binning block enables you to group a variable into discrete categories.

The following demonstrates how to use the Binning block to categorise an input dataset loan_data.csv (containing observations each of which describes a completed loan and the person who took the loan out) using the numerical variable Income:

  1. Import the loan_data.csv dataset into a Workflow using the Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette, then click and drag a Binning block onto the Workflow canvas.
  3. Click the Output port of the loan_data dataset block and drag a connection towards the Input port of the Binning block.
  4. Double-click the Binning block to display the Binning editor.
  5. Click Binning preferences to display the Preferences dialog box.
  6. In the Preferences dialog box, specify a Default bin count of 8. Click OK to close the Properties dialog box.
  7. In the Binning Variables pane:
    1. In the Unselected Variables list, select Income.
    2. Click Select to move the variable to the Selected Variables list.

  8. In the Binning Type pane:
    1. In the Binning Type dropdown list select Equal Width.
    2. Click Bin Variables

    The View Bins pane displays eight equal width bins for values of Income.

    The Bin Statistics pane shows the number of observations in each bin and the percentage of the total number of observations they represent.

  9. Close the Binning editor and save the configuration when prompted.

A green execution status is displayed in the Output port of the Binning block. The Binning block output dataset contains the input dataset plus a new variable (Income_bin) that identifies to which bin each observation belongs.