[SOLVED] Filter text from a list of word
Hi everybody,
I build a process to search and count a list of keywords in thousands of files.
I built the keywords list from a Excel file after seraval operations in an example set with a keyword by example.
I would like to be able to do something like an inverse of "Filter stopwords (Dictionary)" using the attribute of my example set (or a word list if someone can explain me how to convert an example set attribute into a word list).
I found the following topics but I don't know if there is something new since:
Thanks in advance
Johan
I build a process to search and count a list of keywords in thousands of files.
I built the keywords list from a Excel file after seraval operations in an example set with a keyword by example.
I would like to be able to do something like an inverse of "Filter stopwords (Dictionary)" using the attribute of my example set (or a word list if someone can explain me how to convert an example set attribute into a word list).
I found the following topics but I don't know if there is something new since:
- http://rapid-i.com/rapidforum/index.php/topic,2754.0.html
- http://rapid-i.com/rapidforum/index.php/topic,6330.0.html
- http://rapid-i.com/rapidforum/index.php/topic,3719.0.html
- http://rapid-i.com/rapidforum/index.php/topic,3493.0.html
Thanks in advance
Johan
Find more posts tagged with
Sort by:
1 - 2 of
21
- "Set Role" to remove the ID as special attribute
- "Select Attributes" with the "Single" parameter to keep only the keywords
- "Write CSV" with a space as column separator and I connected the "file" output

Do you mean that you saved the list of keywords as example set and each example (row) is a keyword? If yes you could look the process below to see how to convert a example set in to a word list.
Hi Venkatesh
Thank you for your reply.
To begin my work I've a table looking like the following:
By using RapidMiner I transformed this table like this:
I have to filter all documents stored in a folder using the keywords, that's why I needed an operator like the inverse of "Filter Stopwords (Dictionary)" operator.
But "Filter Stopwords (Dictionary)" operator uses a txt file as dictionary.
Finally to solve my problem, I created a new operator "Filter Startword (Dictionary)" by removing the '!' in the class "StopwordOperator" at line 74.
Regarding the list of word (and not WordList) I used the following operator
Greetings
Johan
Thank you for your reply.
To begin my work I've a table looking like the following:
Domain | Sub-domain | Item | Keywords |
Domain 1 | Sub-domain 1 | Item 1 | KW_1, KW_2, |
Domain 1 | Sub-domain 1 | Item 2 | KW_3,KW_4, KW_5, KW_6... |
ID | Item | Keyword |
id_1 | item_1 | KW_1 |
id_1 | item_1 | KW_2 |
id_1 | item_2 | KW_3 |
But "Filter Stopwords (Dictionary)" operator uses a txt file as dictionary.
Finally to solve my problem, I created a new operator "Filter Startword (Dictionary)" by removing the '!' in the class "StopwordOperator" at line 74.
Regarding the list of word (and not WordList) I used the following operator
Greetings
Johan