Workflow: Creating a single dataset from multiple datasets with the Join block

Ian Balanzá-Davis
Ian Balanzá-Davis
Altair Employee
edited October 2022 in Altair RapidMiner

The Join block enables you to combine observations from two datasets into a single working dataset.

The following demonstrates how to use the Join block to link information in two input datasets

  • lib_books.csv, which contains observations that describe a range of books available from a lending library.
  • ddn_subjects.csv, containing observations that link the Dewey Decimal Number to subject descriptions.

Both tables use a common variable Dewey_Decimal_Number:

  1. Import the datasets lib_books.csv and ddn_subjects.csv into a Workflow using a Text File Import block for each dataset.
  2. Right-click the lib_books.csv dataset output, click Rename and enter Lib Books.
  3. Right-click the ddn_subjects.csv dataset output, click Rename and enter Book Subjects.
  4. Expand the Data Preparation group in the Workflow palette, then click and drag a Join block onto the Workflow canvas.
  5. Click the Output port of the lib_books dataset block and drag a connection towards the Input port of the Join block. Repeat for the Book Subjects dataset.
  6. Double-click the Join block to display the the Join Editor view.
    The view displays a table for each dataset, with each table containing the dataset's variable names.
  7. From the Lib Books table, click the variable Dewey_Decimal_Number and, holding the left mouse button down, drag across to the DDN variable in the Book Subjects table, then release the left mouse button.
    A connection is drawn between Dewey_Decimal_Number in Lib Books and DDN in Book Subjects.

  8. Close the Join Editor view and save the configuration when prompted.

A green execution status is displayed in the Output ports of the Join block and the new Working Dataset. The dataset contains variables from both input datasets matched using the Dewey_Decimal_Number and DDN variables.