Issue With Loop Files Operator

thapli_64User: "thapli_64"
New Altair Community Member
Updated by Jocelyn

Hi all,

 

I'm new to the forum and RapidMiner so excuse any redundancies or lack of details.

 

I am working with the process from Chapter 14 (Robust Language Identification) of RapidMiner: Data Maning Use Cases and Business Analytics Applications published by CRC press. The process was downloaded from here: http://rapidminerbook.com/index.php/chapter-downloads-13-24/chapter-14/

 

attachment 1 shows a screenshot of the process and attachment 2 of the loop files sub-process

 

I successfully loaded the process, and downloaded the language corpora from  http://corpora.informatik.uni-leipzig.de/download.html

 

I changed the directory for the loop files operator to read from the folder where the corpora is stored. There are five files in the directory (german, english, french, portugese and spanish). the loop files operator seems to be sucessfully reading all of them, but gives a 6th output which seems nonsensical. attachment 3 shows the expected output for any language file (enlgish in this case). attachment 3 shows the nonsensical output. Attachment 5 shows the error thrown, presumably by the nonsense output. Could someone tell me why it's happening and how to fix it? Thanks!

Find more posts tagged with

Sort by:
1 - 1 of 11
    thapli_64User: "thapli_64"
    New Altair Community Member
    OP
    Accepted Answer

    So, I was able to solve the issue (with some debugging help from a colleague- always good to have someone to talk things through with)! :D

     

    I set up regex filtering (.*\.txt$) in the loop operator to only read in the desired files, in this case the 5 language files ending in .txt

     

    There was, however, another error that cropped up after this was fixed- a duplicate attribute error wrt the 'text' attribute. This was due to the 'select attributes and weights' parameter in the Data to Documents operator being checked but no value being provided for it. it seems this was the case with the process as it was downloaded and not introduced through human error (or so I'm telling myself :P )