How to filter text records out based on a wordlist

ArnoG
ArnoG New Altair Community Member
edited November 5 in Community Q&A

I am using the "filter stopwords" to filter out words matching a list. Is there also a function in Rapidminer available that works the other way around, for example that selects only records when a word in the text of a record matches a word on al list?

I am only interesed in text containing certain words and I am looking for a way to filter these recors out.

Does anabody knows how to do this?

 

Arno

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Try a Filter Documents or Filter Content operator. Those two operators have a "Invert Condition" parameter that lets you select the filterwords.  Or you can use a Wordlist to data operator and then do a generic Filter Examples on it. There's a few ways to go about it I believe. 

  • ArnoG
    ArnoG New Altair Community Member

    Hi Thomas,

    Sorry for my late response. I looked at your suggestions and they will properly work. At the moment I use the " Cut document"  operator to cut reviews into sentences. I can use the "Filter Example"  operator to select the sentences containing certain keywords. The problem I have is that I got a hugh list of keywords, like a couple of thousand.

    I could manually enter the keywords in the "Filter Example"  operator using the custom filter, but I hope that there is a more easier way. For example using the kew wordlist to filter out sentences containing these keywords.

     

    Regards,

     

    Arno

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    You could use macros and loop to loop over the wordlist and automatically drop it into the custom parameter for Filter Examples. I've done that before and it works well.