Workflow: Creating dataset statistics using the Aggregate block


The Aggregate block enables you to apply a function to create a single value from a set of variable values grouped together using other variables in the input dataset. The block is used to generate basic statistics for the input loan_data.csv dataset (containing observations each of which describes a completed loan and the person who took the loan out).

The following demonstrates how to use the Aggregate block to generate the income average for groupings determined by a categorical variable Employment_Type:

  1. Import the loandata.csv dataset into a Workflow using the Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette, then click and drag an Aggregate block onto the Workflow canvas.
  3. Click the Output port of the loandata dataset and drag a connection towards the Input port of the Aggregate block.
  4. Double-click the Aggregate block to display the Configure Aggregate dialog box.
  5. In the Expressions pane:
    1. In the Variable drop-down list, select Income.
    2. In the Function drop-down list, select Average.
      The New variable entry box is automatically populated with the name Income_AVG.

  6.  Click Grouping Variable Selection to display the Grouping Variable Selection pane:
  7.  In the Grouping Variable Selection pane:
    1. In the Unselected Grouping Variables list, select Employment_Type.
    2. Click Select to move the variable to the Selected Grouping Variables list.

  8. Click OK to save the configuration and close the Configure Aggregate dialog box.

A green execution status is displayed in the Output port of the Aggregate block. Double-click the Working Dataset created by the Aggregate block to view the results. The mean average Income is displayed for each unique value in the Employment_Type variable.