read in tab delimited file in Java
siawling
New Altair Community Member
I am new to RapidMiner and have tried using the GUI to create a simple process to check the binary occurrence of term in comments. My input file is of tab delimited format <id>tab<comments>.
I used the 'Start Data Loading Wizard' from ExampleSource to input the data but as I need to integrate the process in Java environment, I read that IOContainer may be able to help me (from the tutorial.pdf). However I am not sure how to go about doing this.
I tried using the ExampleSource directly but it uses the attributes file which for my case, will change every time it runs as I uses different source file. I can't possible use the GUI to generate the aml file and then run the Java program so I need the program to read in the source file (which is the tab-delimited file) directly. Is there a way for ExampleSource to achieve this?
Appreciate any advice or suggestion.
By the way, is there any way to convert all letters to lower case? I found that Preprocessing.Attributes.Filter.Values has a parameter - convert_to_lowercase but could not get it to convert all the comments in the input file to lower case.
Thanks for all your advices
I used the 'Start Data Loading Wizard' from ExampleSource to input the data but as I need to integrate the process in Java environment, I read that IOContainer may be able to help me (from the tutorial.pdf). However I am not sure how to go about doing this.
I tried using the ExampleSource directly but it uses the attributes file which for my case, will change every time it runs as I uses different source file. I can't possible use the GUI to generate the aml file and then run the Java program so I need the program to read in the source file (which is the tab-delimited file) directly. Is there a way for ExampleSource to achieve this?
Appreciate any advice or suggestion.
By the way, is there any way to convert all letters to lower case? I found that Preprocessing.Attributes.Filter.Values has a parameter - convert_to_lowercase but could not get it to convert all the comments in the input file to lower case.
Thanks for all your advices
0
Answers
-
Hi,
did you try reading the file as CSV? This should work also. Additionally, you can convert all characters to lowercase using the [tt]ToLowerCaseConverter[/tt] during the text preprocessing stage.
Kind regards,
Tobias0 -
Thanks Tobias . I converted the file to CSV. I will try using the tab-delimited directly. ToLowerCaseConverter works!0