"text input from a single text file using text plugin"
Hi,
I am new to text plugin, I am trying to do some text clustering using rapidminer with text plugin. I have all the text in one file in which each line needs to be considered as a different document. I tried using SplitSegmenter, but since a new file is created for every line, the space in blowing up which will hamper scalability.
Can someone suggest a way i can cluster the different lines in the same text so i dont hae to create different files.
Appreciate your response
Regards
Angshu
I am new to text plugin, I am trying to do some text clustering using rapidminer with text plugin. I have all the text in one file in which each line needs to be considered as a different document. I tried using SplitSegmenter, but since a new file is created for every line, the space in blowing up which will hamper scalability.
Can someone suggest a way i can cluster the different lines in the same text so i dont hae to create different files.
Appreciate your response
Regards
Angshu
Find more posts tagged with
Sort by:
1 - 3 of
31
Hi Angshu,
Just to add to what Sebastian was saying, in GUI form, you can use the following operator flow,
1. Examplesource - configure your input( tab/csv delimited; format of input fields(nominal or string,etc); type of variable( label for dependent variable and attribute for independent variables, id for keys) ;then save it in attribute file.
2. Stringtextinput - for generating word vectors ; for further info visit,http://kmandcomputing.blogspot.com/2008/06/opinion-mining-with-rapidminer-quick.html
I had faced the same problem and the flow mentioned above helped.
Thanks,
Ram
Just to add to what Sebastian was saying, in GUI form, you can use the following operator flow,
1. Examplesource - configure your input( tab/csv delimited; format of input fields(nominal or string,etc); type of variable( label for dependent variable and attribute for independent variables, id for keys) ;then save it in attribute file.
2. Stringtextinput - for generating word vectors ; for further info visit,http://kmandcomputing.blogspot.com/2008/06/opinion-mining-with-rapidminer-quick.html
I had faced the same problem and the flow mentioned above helped.
Thanks,
Ram
this is possible. You have to do a little trick: Load the file using the CSVExampleSource operator. Configure the operator in a way, that only one column is created from the file! In order to do so, specify a text never occuring in the field for the column separtion regular expression. Then insert a Nominal2String operator to change the value type to string. After this, using the StringTextInput, you can transform the texts into wordvectors for clustring. To simplify your life, I append a sample process:
Greetings,
Sebastian