Workflow: Selecting rows in a dataset using the Query block

IanBD
IanBD
Altair Employee
edited October 2022 in Altair RapidMiner

The Query block enables you to create SQL code that you can use to join database tables or datasets, to get specific data from one or more database tables or datasets.

The following demonstrates how to use the Query block to select rows from the input dataset lib_books.csv (which contains observations that describe a range of books available from a lending library) that contain the title of a book by a particular author that is no longer in the library.

  1. Import the file lib_books.csv into a Workflow using the Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette,then click and drag a Query block onto the Workflow canvas.
  3. Click the Output port of the Lib Books dataset block and drag a connection towards the Input port of the Query block.
  4. Double-click the Query block to display the Query Editor view.
  5. In the Tables panel, in the Working Dataset, double click Title and then In_stock.
  6. The Title and In_stock rows are added to the Columns panel.

  7. Ensure the columns are included in the query by selecting Select, if necessary.
  8. Enter the following in the Where box:
    author EQ "Rendell, Ruth" AND In_Stock EQ "n"

    This specifies that only those rows where the Author column contains the text Rendell, Ruth and the In_Stock column contains n, are output to the working dataset.
  9. Close the Query Editor view and save the configuration when prompted.

A green execution status is displayed in the Output ports of the Query block and the new Working Dataset. The dataset contains the books written by Ruth Rendell that are no longer in the library collection.