How to filter text records out based on a wordlist
I am using the "filter stopwords" to filter out words matching a list. Is there also a function in Rapidminer available that works the other way around, for example that selects only records when a word in the text of a record matches a word on al list?
I am only interesed in text containing certain words and I am looking for a way to filter these recors out.
Does anabody knows how to do this?
Arno
Answers
-
Try a Filter Documents or Filter Content operator. Those two operators have a "Invert Condition" parameter that lets you select the filterwords. Or you can use a Wordlist to data operator and then do a generic Filter Examples on it. There's a few ways to go about it I believe.
0 -
Hi Thomas,
Sorry for my late response. I looked at your suggestions and they will properly work. At the moment I use the " Cut document" operator to cut reviews into sentences. I can use the "Filter Example" operator to select the sentences containing certain keywords. The problem I have is that I got a hugh list of keywords, like a couple of thousand.
I could manually enter the keywords in the "Filter Example" operator using the custom filter, but I hope that there is a more easier way. For example using the kew wordlist to filter out sentences containing these keywords.
Regards,
Arno
0 -
You could use macros and loop to loop over the wordlist and automatically drop it into the custom parameter for Filter Examples. I've done that before and it works well.
0