Workflow: Reducing the size of a dataset using the Filter block


The Filter block enables you to reduce the total number of observations in a large dataset.

The following demonstrates how to use the Filter block to reduce the size of an input dataset loan_data.csv (containing observations each of which describes a completed loan and the person who took the loan out) using the numerical variable Income:

  1. Import the loan_data.csv dataset into a Workflow using a Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette, then click and drag a Filter block onto the Workflow canvas.
  3. Click the Output port of the loan_data dataset block and drag a connection towards the Input port of the Filter block.
  4. Double-click on the Filter block to display the Filter Editor view.
  5. Click the Basic tab.
    1. In the Variable drop-down list, select Income.
    2. In the Operator drop-down list, select >= (less-than or equal-to).
    3. In the Value box, enter 50000.

  6. Close the Filter Editor and save the configuration when prompted.

A green execution status is displayed in the Output port of the Filter block. The Filter block output dataset contains observations from the input dataset where the income is at least £50,000.