Workflow: Taking a subset of a dataset using the Sampling block

IanBD
IanBD
Altair Employee
edited October 2022 in Altair RapidMiner

The Sampling block enables you take a sample of a dataset. Taking samples of datasets can help with processing times if you are working with a large dataset.

The following demonstrates how to use the Sampling block to take ten percent of the dataset loan_data.csv:

  1. Import the loan_data.csv dataset onto a Workflow canvas using the Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette, then click and drag a Sampling block onto the Workflow canvas.
  3. Click the Output port of the loan_data dataset block and drag a connection towards the Input port of the Sampling block.
  4. Double-click the Sampling block to display the Configure Sampling dialog box.
  5. In the Configure Sampling dialog box:
    1. In Sampling Type, select Random.
    2. In Random Sampling select Percentage of obs and enter 10.
  6. Click OK to save the configuration and close the Configure Sampling dialog box.

A green execution status is displayed in the Output ports of the Sampling block and the new Working Dataset. The Sampling block output dataset contains 10% of the input loan_data dataset.