Counting Emojis

Survuel
New Altair Community Member
Hi guys, I created community account just for this problem:
I have a excel file full of extracted comments from facebook group and I need to mine all the emojis out of it and count them. Could you please tell me how to do it? I've seen one post where it was described but it uses Encode/decode operator and I don't have them and I don't really understand how to do these kinds of things (and also I'm newb, downloaded trial version just for this one-time use) Any help is greatly appreciated
I have a excel file full of extracted comments from facebook group and I need to mine all the emojis out of it and count them. Could you please tell me how to do it? I've seen one post where it was described but it uses Encode/decode operator and I don't have them and I don't really understand how to do these kinds of things (and also I'm newb, downloaded trial version just for this one-time use) Any help is greatly appreciated
Tagged:
0
Answers
-
Hi @Survuel ,I would recommend to start with watching the introduction videos at the RapidMiner Academy: https://academy.rapidminer.com/For an overview about text mining, check these tutorials:The reason why you didn't find the operator is that you need to install the Text Mining extension via the RapidMiner marketplace.In the top menu got to Extensions -> Marketplace and search for "text processing"I hope that helps for a first start.
2 -
The encode and decode operators are part of the web extension, but not sure if you really need it.
In your text your emojis might already be represented in their unicode format, if not the decode may be useful.
Then the challenge will be to find the valid unicode ranges, and transform them into a meaningful name for grouping purposes.
You can find the whole unicode list here : https://unicode.org/emoji/charts/full-emoji-list.html
So a possible workflow could be as follows :
-> use the text operators to tokenize all your content, by splitting on space or so
-> keep only the ones within the emoji unicode range (1F600 to E007F)
-> count these and eventually map them to something meaningful (like 1F4A9 = pile of poo). You could use the above link to generate this mapping table also.
2 -
I really think there is a problem with our search engine.
I wrote this KB a while ago about exactly this use case..
https://community.rapidminer.com/discussion/52570/counting-emojis-in-text-mining
Scott
1 -
Once again I'm impressed with amount of knowledge present in the community.Thanks Scott.1
-
Yes I know, I'm trying to follow the post but I'm completely lost. I tried to download and run the process you provided in the end but I just don't know how to run it (please consider that I've downloaded RapidMiner yesterday so my knowledge and skills are really limited. I have few screenshots and If you could help by telling me what to put there
So for example in "encode url" what do I put into the url attribute bar? (obviously not cell range lol hence it doesn't get me anywhere) and is the encoding selected right? (UTF-8)
Next on, Replace (dictionary), I have no idea whatsoever what to do with it (which attribute filter do I need? What do I need to write after "from attribute" and "to attribute"?)
And same goes for "Decode URL" what am I supposed to put in url attribute and encoding ?
I would provide you with screenshots but I'm not long enough a member to post them.
I mean don't get me wrong this programme looks amazing I just can't seem to learn these things in one day (been up till 4 AM last night trying to figure things out)
Thanks0 -
Well i got your process working but now I'm ever more lost than before so I'm just going to Ctrl+F find it in Excel probably0
-
HI @Survuel - well you are doing pretty well for someone who downloaded RM yesterday
I would strongly recommend taking a little time to go through the basic training before tackling tricky stuff like this. It will be well worth your time:
https://academy.rapidminer.com/
Scott
0