Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Which operator?
johnny
Hello,
I am considering using Rapidminer for a piece of PhD research on webforums and I'm feeling my way around the program.
What I want to do is use Rapidminer to test a large data set drawn from web forum databases to see three things:
a) how often certain phrases that I am interested in appear;
b) whether this reduces over time - depending on the date of posting in the forum);
c) and whether references to these phrases are favourable.
My dataset is several CSV files that contain 7 colums, and thousands of rows. Each row contains posting details of a forum posting, and the complete text of that posting, meaning that the "Message" field can be hundreds of words long. Colums are: "MessageID" "ThreadID" "ThreadName" "MemberID" "MemberName" "P_Date" "Message".
My question is, which operator should I use to load this kind of CSV that would allow me to use all seven columns?
I am using both Rapidminer 4.6 and 5 to see which is the easiest to learn, and would appreciate any guidance members have on this.
Find more posts tagged with
AI Studio
Accepted answers
All comments
land
Hi,
I would recommend RapidMiner 5.0. It not only lowers the learning curve a lot, but also has the more advanced text processing capabilities.
You can load your data with the read csv operator or simply import it using the wizards (File / Import Data). After this you will be able to use the Process Documents from data operator of the Text Processing Extension to analyse each single Text. By default this operator will generate texts from all attributes of type text. So you might want to change the type of your attribute that stores the text with the operator Nominal to Text.
Greetings,
Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups