read in tab delimited file in Java

siawling
siawling New Altair Community Member
edited November 5 in Community Q&A
I am new to RapidMiner and have tried using the GUI to create a simple process to check the binary occurrence of term in comments. My input file is of tab delimited format <id>tab<comments>.

I used the 'Start Data Loading Wizard' from ExampleSource to input the data but as I need to integrate the process in Java environment, I read that IOContainer may be able to help me (from the tutorial.pdf). However I am not sure how to go about doing this.

I tried using the ExampleSource directly but it uses the attributes file which for my case, will change every time it runs as I uses different source file. I can't possible use the GUI to generate the aml file and then run the Java program so I need the program to read in the source file (which is the tab-delimited file) directly. Is there a way for ExampleSource to achieve this?

Appreciate any advice or suggestion.

By the way, is there any way to convert all letters to lower case? I found that Preprocessing.Attributes.Filter.Values has a parameter - convert_to_lowercase but could not get it to convert all the comments in the input file to lower case.

Thanks for all your advices
Tagged:

Answers

  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi,

    did you try reading the file as CSV? This should work also. Additionally, you can convert all characters to lowercase using the [tt]ToLowerCaseConverter[/tt] during the text preprocessing stage.

    Kind regards,
    Tobias
  • siawling
    siawling New Altair Community Member
    Thanks  Tobias :). I converted the file to CSV. I will try using the tab-delimited directly. ToLowerCaseConverter works!