Workflow: Creating dataset statistics using the Aggregate block
The Aggregate block enables you to apply a function to create a single value from a set of variable values grouped together using other variables in the input dataset. The block is used to generate basic statistics for the input loan_data.csv dataset (containing observations each of which describes a completed loan and the person who took the loan out).
The following demonstrates how to use the Aggregate block to generate the income average for groupings determined by a categorical variable Employment_Type:
- Import the loandata.csv dataset into a Workflow using the Text File Import block.
- Expand the Data Preparation group in the Workflow palette, then click and drag an Aggregate block onto the Workflow canvas.
- Click the Output port of the loandata dataset and drag a connection towards the Input port of the Aggregate block.
- Double-click the Aggregate block to display the Configure Aggregate dialog box.
- In the Expressions pane:
- In the Variable drop-down list, select Income.
- In the Function drop-down list, select Average.
The New variable entry box is automatically populated with the name Income_AVG.
- Click Grouping Variable Selection to display the Grouping Variable Selection pane:
- In the Grouping Variable Selection pane:
- In the Unselected Grouping Variables list, select Employment_Type.
- Click Select to move the variable to the Selected Grouping Variables list.
- Click OK to save the configuration and close the Configure Aggregate dialog box.
A green execution status is displayed in the Output port of the Aggregate block. Double-click the Working Dataset created by the Aggregate block to view the results. The mean average Income is displayed for each unique value in the Employment_Type variable.