Web scraping & sentiment analysis in non-English language

linn_ansved_636
linn_ansved_636 New Altair Community Member
edited November 5 in Community Q&A

Hi,

 

I'm new to Rapidminer and I'm hoping to use RapidMiner and Aylien to web scrape and perform sentiment analysis on many different news pages. The problem is that I want to gather the information from articles written in Swedish. Does anyone know if this is possible and if so, where can I find more information? I've already checked these tutorials out:

https://docs.aylien.com/textapi/rapidminer-extension/#step-3-categorizing-tweets

 

I've also looked at Aylien's news API, but don't know if that could help.

https://aylien.com/news-api/ 

 

Would really appreciate some guidance on this!

Answers

  • sgenzer
    sgenzer
    Altair Employee

    hi @linn_ansved_636 - welcome to the community. So webscraping websites in Swedish is no problem at all. Just use the various operators in the Web Mining extension as you would do in English.

     

    The sentiment analysis is more of an interesting question. Aylien does not appear to support native sentiment analysis in Swedish (see https://docs.aylien.com/textapi/#language-support). And it does not seem that IBM Watson Tone Analyzer does Swedish either. So if you want to use one of these tools, I'd recommend pre-processing the text through a translation engine first (although some of the "tone" will likely be inaccurate due to the translation).


    Scott

     

  • SGolbert
    SGolbert New Altair Community Member

    Hi @linn_ansved_636,

     

    Most of the steps in text processing are language agnostic. The only steps that are specific for a language are stop words and stemming. In both cases you can use the Filter Stopwords (Dictionary) and Stemming (Dictionary) operators with external dictionaries.

     

    I hope that that helps!

  • linn_ansved_636
    linn_ansved_636 New Altair Community Member

    Great, thanks for the reply! Do you know if RapidMiner has a built-in translate function? If so, I could scrape websites written in Swedish, then translate them into English, and then perform the sentiment analysis. My hope is that all of this would be able to do in RapidMiner. Any thoughts?

     

    Thanks,

    Linn

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    RM itself does not yet have this build in - maybe a nice feature to add?

     

    Maybe @koen can help?

     

    Best,

    Martin

  • sgenzer
    sgenzer
    Altair Employee

    there is no current built-in feature but hopefully our Google Cloud custom operators will improve over time so that we can include Google Translate. Meanwhile I did write this KB article a while back that will do the trick (albeit without an "out-of-the-box" custom operator).

     

    https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-to-interact-with-Google-Cloud-APIs-with-the-Web-Mining/ta-p/35280


    Scott

     

  • linn_ansved_636
    linn_ansved_636 New Altair Community Member

    Great, thanks. Finally, do you know if there are any tutorials on how do web scrape and perform a sentiment analysis in English using RapidMiner?

     

     

    /Linn

  • linn_ansved_636
    linn_ansved_636 New Altair Community Member

    Hi again,

     

    I'm also interested in getting a graph of how the sentiment changes over time, e.g. in may the number of positives is X, in june... etc

     

    Any guidance?

     

    Thanks,

    /Linn

  • sgenzer
    sgenzer
    Altair Employee

    hi @linn_ansved_636 - sure lots of resources on that. Have you first checked out our YouTube channel?

     

    https://www.youtube.com/channel/UCxneJBWWNLs-A6ckls1Rrug?view_as=subscriber

     

    Scott