Workflow: Creating a single dataset from multiple datasets with the Join block

IanBD
IanBD
Altair Employee
edited October 2022 in Altair RapidMiner

The Join block enables you to combine observations from two datasets into a single working dataset.

The following demonstrates how to use the Join block to link information in two input datasets

  • lib_books.csv, which contains observations that describe a range of books available from a lending library.
  • ddn_subjects.csv, containing observations that link the Dewey Decimal Number to subject descriptions.

Both tables use a common variable Dewey_Decimal_Number:

  1. Import the datasets lib_books.csv and ddn_subjects.csv into a Workflow using a Text File Import block for each dataset.
  2. Right-click the lib_books.csv dataset output, click Rename and enter Lib Books.
  3. Right-click the ddn_subjects.csv dataset output, click Rename and enter Book Subjects.
  4. Expand the Data Preparation group in the Workflow palette, then click and drag a Join block onto the Workflow canvas.
  5. Click the Output port of the lib_books dataset block and drag a connection towards the Input port of the Join block. Repeat for the Book Subjects dataset.
  6. Double-click the Join block to display the the Join Editor view.
    The view displays a table for each dataset, with each table containing the dataset's variable names.
  7. From the Lib Books table, click the variable Dewey_Decimal_Number and, holding the left mouse button down, drag across to the DDN variable in the Book Subjects table, then release the left mouse button.
    A connection is drawn between Dewey_Decimal_Number in Lib Books and DDN in Book Subjects.

  8. Close the Join Editor view and save the configuration when prompted.

A green execution status is displayed in the Output ports of the Join block and the new Working Dataset. The dataset contains variables from both input datasets matched using the Dewey_Decimal_Number and DDN variables.