Workflow: Reducing the size of a dataset using the Filter block
The Filter block enables you to reduce the total number of observations in a large dataset.
The following demonstrates how to use the Filter block to reduce the size of an input dataset loan_data.csv (containing observations each of which describes a completed loan and the person who took the loan out) using the numerical variable Income:
- Import the loan_data.csv dataset into a Workflow using a Text File Import block.
- Expand the Data Preparation group in the Workflow palette, then click and drag a Filter block onto the Workflow canvas.
- Click the Output port of the loan_data dataset block and drag a connection towards the Input port of the Filter block.
- Double-click on the Filter block to display the Filter Editor view.
- Click the Basic tab.
- In the Variable drop-down list, select Income.
- In the Operator drop-down list, select >= (less-than or equal-to).
- In the Value box, enter 50000.
- Close the Filter Editor and save the configuration when prompted.
A green execution status is displayed in the Output port of the Filter block. The Filter block output dataset contains observations from the input dataset where the income is at least £50,000.