Stop Word and Stemming List / Dictionary

arizah78
arizah78 New Altair Community Member
edited November 5 in Community Q&A

Dear All...

I've been using RapidMiner for quite some time, especially for the text mining function. I have difficulty in retrieving the stop word list and stemming (snowball), both for English. The list would help me in updating the content and increase the preciseness of my text mining process. I do really hope if anybody could share with me these lists (stop word and stemming) or at least let me know where/how I can find these lists. Your kind assistance is highly appreciated. 

Thanks.

 

 

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member

    There is a lot of more detailed information available about the snowball project here: https://snowballstem.org/algorithms/

     

  • sgenzer
    sgenzer
    Altair Employee

    hello @arizah78 -

     

    Just to add a bit...there was a similar request in another thread for the Arabic stopword list and I'm looking into it.  The lists are easy to access; we just want to make sure that we're allowed to (the extension is not open-source and hence the author of the list has copyright ownership by default).  I will let folks here on the community know when I get this answered.

     

    Thanks for understanding.


    Scott

  • arizah78
    arizah78 New Altair Community Member

    Hi Scott,

    Thanks for your update.

    Really hope to get a positive feedback soon.

     

     

     

  • arizah78
    arizah78 New Altair Community Member

    Thanks. Appreciate the link sharing.

  • sgenzer
    sgenzer
    Altair Employee

    hello @arizah78 - I have the code to the extension (which contains the wordlists) and it is indeed open source.  I will send you the file via PM.

     

    Scott