Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Mining Twitter - Data loops
timeitself
Hi all.
Working on my PhD dissertation, I downloaded ~5K tweets in a JSON format, placed them in a MongoDB database, extracted re-tweet graph data to be analyzed by Gephi/NodeXL, extracted text for a semantic analysis with RapidMiner.
Tweets texts are in a CSV (I could extract them in other formats as well), 1 tweet text per row, for a total of ~5K rows.
I need to analyze every tweet to get something close to a semantic value, that for a very first round could be a list of the words (per each of the tweets), after tokenization, n-gramming and filtering stopwords. I will extract a semantic value out of the words after that (by word-based semantic distance).
I'm far from being proficient in RapidMiner (my apologies!) and what I got reading the CSV file is a list of words for all the tweets, not the individual ones.
I would probably need a loop starting from the 1st row, processing it and iterate till the end of the rows.
I couldn't find a way to use the loops operators in the proper way ...
Your help would be highly appreciated!
Thanks
Carlo
Find more posts tagged with
AI Studio
Accepted answers
All comments
MariusHelf
Hi Carlo,
I suppose you are the Process Documents from Data operator. Like any other Process Documents operator, it provides two outputs: the word vector, which indeed delivers global statistics, but also an example set, which contains word counts for every single document. If you switch the vector_creation to Term Occurrences, you get absolute numbers. For classification/regression tasks etc. however, you usually will use the TF/IDF norm.
Best regards,
Marius
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups