🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

evaluating text

User: "MarkusW"
New Altair Community Member
Updated by Jocelyn
Hi,
I'm trying to test, how well a simple machine does at predicting a property of a text (specifically sarcasm).
I have my data in a massive table, where one colomn is the source, one is the label, that should be predicted and the last colomn is the text, the algorithm(s) should analyze.
The problem is without some tool to extract meaning or sentiment the results are (not surprisingly) abysmal.
Both the promotional texts on the Rapid-miner main page and the professor, who suggested I use Rapid Miner, imply that there are such tools already part of Rapid Miner, however I have not yet found anything in the documentation /manual.

What are these tools called/how are they used?
Sort by:
1 - 2 of 21
    User: "BalazsBaranyRM"
    New Altair Community Member
    Accepted Answer
    Hi @MarkusW,

    RapidMiner has a Marketplace that you find in the menu ("Extensions"). There you will find the Text Processing and Web Mining extensions. 

    There's a full Text Mining course in the Academy:
    https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer

    Regards,
    Balázs
    User: "BalazsBaranyRM"
    New Altair Community Member
    Accepted Answer
    Hi!

    Yes, sarcasm detection is a big challenge and simple models don't cut it.

    Have you seen "Automatic Classification of Documents" in the Academy course? 

    It explains the Process Documents operator. The only addition you would need here is "Generate n-Grams (Terms)". This will create new attributes of term combinations like "not very good" and "i really liked it". Of course, all combinations of subsequent words will be created, so this gives you a massive number of new attributes. This might help you with the sarcasm or not. 

    Naive Bayes and SVM are the modeling algorithms well suited for this situation. Other algorithms will take ages and don't perform well on this kind of data, with the possible exception of Deep Learning, but you'll need massive resources to execute that.

    Regards,
    Balázs