I need an operator like the inverse of Filter Stopwords (Dictionary) operator

Fatmah
Fatmah New Altair Community Member
edited November 5 in Community Q&A

Hi

Thanks for reading my post

I work on my master thesis and I find same my problem here in this link

http://community.rapidminer.com/t5/RapidMiner-Studio/SOLVED-Filter-text-from-a-list-of-word/td-p/21459

He solve the problem by changing the code for filterstopword(dictionary)

I read the document "How to extend rapidminer"

I prepare the envirmonet by downloding Java and Eclipse Java Neon 

I know now how to create my own operator but I don't know how to copy existing operator code and modify it ?

 

Thank you again

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Have you checked out the Filter Stopwords by Dictionary operator? There you can provide a custom txt file for stopwords.

  • Fatmah
    Fatmah New Altair Community Member
    Hi Thomas
    Thanks for replay
    Yes, I checked it many times
    The operator will filter out the document from the words in text file
    I want the filter to filter the document from all words exepet the words inside the text file
    the opposite what I need ..
  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Would Filter by Content help? It has an inverse condition. What about Filter by POS tags? That also has an inverse condition.

  • Fatmah
    Fatmah New Altair Community Member

    I try Filter by content but it is insufficient for my situation because the text file have tens of words
    Filter by content will be good for very small number of expression
    In Filter by POS tag ? I can't determine the words I want !

     

    Thank you again for your help and yes plz if you have any suggestion tell me or if you know how I can reach the code ?

  • hmhsing
    hmhsing Altair Community Member
    I changed the dictionary txt file into Excel and then use Filter Tokens Using ExampleSet (need check invert filter), it works. See the attached file.
  • kayman
    kayman New Altair Community Member
    In theory you could use the process documents from data operator and use your reversed stoplist (or whitelist) as a wordlist, this would allow only the words in your list as acceptable. There is no real out of the box operator to create your own wordlist but this one goes more in detail : 

    https://community.rapidminer.com/discussion/51131/creating-a-comparing-white-list-of-words-to-a-wordlist-from-a-data-mined-webpage