Using the Text Transform block to manipulate a variable in a dataset

Ian Balanzá-Davis
Ian Balanzá-Davis
Altair Employee
edited April 23 in Altair RapidMiner

The Text Transform block enables you to define operations used to manipulate the contents of variables from an input dataset.

The following demonstrates how to use the Text Transform block to remove punctuation from a variable in the input dataset lib_books.csv (which contains observations that describe a range of books available from a lending library).

  1. Import the lib_books.csv dataset onto a Workflow canvas using the Text File Import block.
  2. Expand the Data Preparation group in the Workflow palette, then click and drag a Text Transform block onto the Workflow canvas.
  3. Click the Output port of the lib_books dataset block and drag a connection towards the Input port of the Text Transform block.
  4. Double-click the Text Transform block to display the Text Transform editor view.
  5. In the Text Transform editor view:
    1. In the Input Variable drop-down list, select LastAccessed.
    2. From the Remove drop-down list, select Character set.
    3. In the character set options, select Punctuation.
    4. Press CTRL+S to save the configuration.
  6. Click OK to save the configuration and close the Text Transform editor view.

A green execution status is displayed in the Output ports of the Text Transform block and the new Working Dataset. The Text Transform block output dataset contains the input dataset where punctuation is removed from the LastAccessed variable.