Greetings community,
I am learning to use RapidMiner to extract and to analyse occurrences of selected keywords in annual reports, prepared by commercial entities. RapidMiner works well for all the key words I study, except for one.
For some reason, Filter Stopwords (English) operator filters out word 'important' for the whole corpus of documents I study.
E.g. I have a document , where manual search shows me that it contains the following words of interest:
important - 11
importantly - 4
importance - 4
Using Process Documents from Files, with Filter Stopwords (English) operator ON, I can see only occurrences of the words 'importantly' and 'importance', having this operator OFF allows me also to extract the expected 11 occurrences of word 'important'.
I tried to change tokenizing from 'non letters' to 'linguistic tokens' option, but it did not help.
Question: Is it an (known) error?
( I don't see the </> icon to share my process )
Kind regards,