read in tab delimited file in Java

siawling
siawling New Altair Community Member
edited November 2024 in Community Q&A
I am new to RapidMiner and have tried using the GUI to create a simple process to check the binary occurrence of term in comments. My input file is of tab delimited format <id>tab<comments>.

I used the 'Start Data Loading Wizard' from ExampleSource to input the data but as I need to integrate the process in Java environment, I read that IOContainer may be able to help me (from the tutorial.pdf). However I am not sure how to go about doing this.

I tried using the ExampleSource directly but it uses the attributes file which for my case, will change every time it runs as I uses different source file. I can't possible use the GUI to generate the aml file and then run the Java program so I need the program to read in the source file (which is the tab-delimited file) directly. Is there a way for ExampleSource to achieve this?

Appreciate any advice or suggestion.

By the way, is there any way to convert all letters to lower case? I found that Preprocessing.Attributes.Filter.Values has a parameter - convert_to_lowercase but could not get it to convert all the comments in the input file to lower case.

Thanks for all your advices
Tagged:

Answers

  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi,

    did you try reading the file as CSV? This should work also. Additionally, you can convert all characters to lowercase using the [tt]ToLowerCaseConverter[/tt] during the text preprocessing stage.

    Kind regards,
    Tobias
  • siawling
    siawling New Altair Community Member
    Thanks  Tobias :). I converted the file to CSV. I will try using the tab-delimited directly. ToLowerCaseConverter works!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.