How to do text classification only by java code
lucky_q
New Altair Community Member
Hi all,
I am new to RapidMiner communuty. Recently, I'm planning to use Rapidminer for text classification. I want to develop a small demo system (which means do not write xml file) in order to get familiar with the source code of Rapidminer. I tried rapidminer5.0 at first, as there isn't enough documentation and sample for rapidminer5.0, I decided to use 4.6 instead. Unfortunately, I still do not know how to finish that only by java code.
I meet 2 problems :
1: Which operator could help me in transforming all the original messages stored in particular folder into single file which contains the word vector or feature vector. I know the Text Processing plugin, but I'm not sure how to do that from reading original file and only using java code. could anybody show me how to do that?
2: For training the feature vector, which is the easiest way for me to do if I want to use only java code? Is there any sample code could show me how to reading a feature vector file and generate a mod file. (like using weka)
I know these are all stupid questions, it's just I have know idea how to do this. I would be very very appreciated if somebody could give me some sample code (for rapidminer4.6) to show me how the whole process work. Thanks.
I am new to RapidMiner communuty. Recently, I'm planning to use Rapidminer for text classification. I want to develop a small demo system (which means do not write xml file) in order to get familiar with the source code of Rapidminer. I tried rapidminer5.0 at first, as there isn't enough documentation and sample for rapidminer5.0, I decided to use 4.6 instead. Unfortunately, I still do not know how to finish that only by java code.
I meet 2 problems :
1: Which operator could help me in transforming all the original messages stored in particular folder into single file which contains the word vector or feature vector. I know the Text Processing plugin, but I'm not sure how to do that from reading original file and only using java code. could anybody show me how to do that?
2: For training the feature vector, which is the easiest way for me to do if I want to use only java code? Is there any sample code could show me how to reading a feature vector file and generate a mod file. (like using weka)
I know these are all stupid questions, it's just I have know idea how to do this. I would be very very appreciated if somebody could give me some sample code (for rapidminer4.6) to show me how the whole process work. Thanks.
0
Answers
-
Hi,
If there was not enough for you in RM 5.00 on text mining then good luck with 4.6 which is no longer supported by RM staff on this forum. If you want to write your own extensions Seb has written a guide you can pay for, but that is only for version 5. Weka is actually supported by Pentaho, who run a forum for that.0 -
Thank you so much. I'm so sorry to post this subject in two different places. I'm not sure which one is more appropriate.
Is there any sample java code for that? does rapidminer 4.6 could use weka 5.0 plugin? I have no idea how to implement them.
Thanks, again.0 -
Hi,
I suppose you should first make yourself familiar with RapidMiner and the Text Processing Extension before actually thinking of integrating it. Your questions are not a matter of coding but of using RapidMiner itself. For everything after this, there is the White Paper for writing Extensions, which will give you a good understanding how RapidMiner works under the hood and the API documentation which gives you the details.
After all you can book webinars telling you how to use Text Mining.
If you decide to use RapidMiner 4.x: Good luck. I know the code of the former Text Plugin, you will need it. For most of it's parts it was easier to rewrite it from scratch than revise it.
Greetings,
Sebastian0 -
Thank you so much for your advice. I think you are right, I should first learn the basic of RapidMiner rather than find some sample java code. I guess it won't take me too much time. Thanks, again.0