Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Basic Text Mining From an Excel File
monamahfouz
Hi everyone,
I would really appreciate some help / direction on how to tackle a basic text mining task. Basically, I have a spreadsheet that has one column that I am interested in, the column is titled: "Hashtags." I would like to count the occurrences of each unique hashtag, and output the number of occurrences of each, using RapidMiner.
A single row might have several hashtags in one cell, for example, row #1's value is: "12YearsASlave Oscars2014 AmericanHustle AcademyAwards2014" -- which means there are FOUR hashtags here and should each count towards the count of the four unique hashtags. Hence, I will need to tokenize every row's value.
If the tokenization is complex, I can ignore this bit and treat each row as one hashtag for now. My dataset is very large so I can ignore the rows that have multiple hashtags in one cell to get it to work.
I tried using SelectAttributes, Tokenize and DataToDocument but I am hitting a wall.
Any help / direction is appreciated, and hope this isn't too basic. Thanks for your help!
Mona
Find more posts tagged with
AI Studio
Accepted answers
All comments
MariusHelf
Hi Mona,
you don't need any Text Processing operators (in the RapidMiner sense) at all. First let's ignore the multi-tag rows:
Load your data, and add a Filter Examples operator with the attribute_value_filter "Hashtag != .* .*" (without the quotes).
Then add an Aggregate operator. Group by Hashtag and add the aggregation function count for Hashtag. That's it
Best regards,
Marius
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups