Workflow: Taking a subset of a dataset using the Sampling block
The Sampling block enables you take a sample of a dataset. Taking samples of datasets can help with processing times if you are working with a large dataset.
The following demonstrates how to use the Sampling block to take ten percent of the dataset loan_data.csv:
- Import the loan_data.csv dataset onto a Workflow canvas using the Text File Import block.
- Expand the Data Preparation group in the Workflow palette, then click and drag a Sampling block onto the Workflow canvas.
- Click the Output port of the loan_data dataset block and drag a connection towards the Input port of the Sampling block.
- Double-click the Sampling block to display the Configure Sampling dialog box.
- In the Configure Sampling dialog box:
- In Sampling Type, select Random.
- In Random Sampling select Percentage of obs and enter 10.
- Click OK to save the configuration and close the Configure Sampling dialog box.
A green execution status is displayed in the Output ports of the Sampling block and the new Working Dataset. The Sampling block output dataset contains 10% of the input loan_data dataset.