Web scraping & sentiment analysis in non-English language
Hi,
I'm new to Rapidminer and I'm hoping to use RapidMiner and Aylien to web scrape and perform sentiment analysis on many different news pages. The problem is that I want to gather the information from articles written in Swedish. Does anyone know if this is possible and if so, where can I find more information? I've already checked these tutorials out:
https://docs.aylien.com/textapi/rapidminer-extension/#step-3-categorizing-tweets
I've also looked at Aylien's news API, but don't know if that could help.
Would really appreciate some guidance on this!
Answers
-
hi @linn_ansved_636 - welcome to the community. So webscraping websites in Swedish is no problem at all. Just use the various operators in the Web Mining extension as you would do in English.
The sentiment analysis is more of an interesting question. Aylien does not appear to support native sentiment analysis in Swedish (see https://docs.aylien.com/textapi/#language-support). And it does not seem that IBM Watson Tone Analyzer does Swedish either. So if you want to use one of these tools, I'd recommend pre-processing the text through a translation engine first (although some of the "tone" will likely be inaccurate due to the translation).
Scott0 -
Hi @linn_ansved_636,
Most of the steps in text processing are language agnostic. The only steps that are specific for a language are stop words and stemming. In both cases you can use the Filter Stopwords (Dictionary) and Stemming (Dictionary) operators with external dictionaries.
I hope that that helps!
0 -
Great, thanks for the reply! Do you know if RapidMiner has a built-in translate function? If so, I could scrape websites written in Swedish, then translate them into English, and then perform the sentiment analysis. My hope is that all of this would be able to do in RapidMiner. Any thoughts?
Thanks,
Linn
0 -
Hi,
RM itself does not yet have this build in - maybe a nice feature to add?
Maybe @koen can help?
Best,
Martin
0 -
there is no current built-in feature but hopefully our Google Cloud custom operators will improve over time so that we can include Google Translate. Meanwhile I did write this KB article a while back that will do the trick (albeit without an "out-of-the-box" custom operator).
Scott0 -
Great, thanks. Finally, do you know if there are any tutorials on how do web scrape and perform a sentiment analysis in English using RapidMiner?
/Linn
0 -
Hi again,
I'm also interested in getting a graph of how the sentiment changes over time, e.g. in may the number of positives is X, in june... etc
Any guidance?
Thanks,
/Linn
0 -
hi @linn_ansved_636 - sure lots of resources on that. Have you first checked out our YouTube channel?
https://www.youtube.com/channel/UCxneJBWWNLs-A6ckls1Rrug?view_as=subscriber
Scott
0