Workflow: Replacing missing values with the Impute block


The Impute block enables you to replace missing values in a dataset variable based on other values for that variable.

The block is used to replace the missing values in the Price variable in an input dataset lib_books.csv (which contains observations that describe a range of books available from a lending library) based on the distribution of the non-missing values:

  1. Import the lib_books.csv dataset onto a Workflow canvas using the Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette, then click and drag an Impute block onto the Workflow canvas.
  3. Click the Output port of the lib_books dataset block and drag a connection towards the Input port of the Impute block.
  4. Double-click the Impute block to display the Configure Impute dialog box.
  5. In the Configure Impute dialog box:
    1. In the Variable drop-down list, select Price.
    2. In the Method drop-down list, select Distribution.

  6. Click OK to save the configuration and close the Configure Impute dialog box.

A green execution status is displayed in the Output ports of the Impute block and the new Working Dataset. The dataset contains the input lib_books dataset with new values to replace the missing values in the Price variable.