Dear All,
I am new to RapidMiner and have an issue where I do not really know how to start it:
I have the following data:
- One file (pdf, txt or html) with a collection of 1000 different news articles.
- A list with about 30 keywords.
I want to extract all those articles, that match at least with one of the keywords.
My questions are:
1. What do I have to do such that RapidMiner can distinguish where an article starts and ends? When I import my news articles with the operator „Read Data“ it seems to me that the whole data is considered as „one article“.
2. What kind of process do I need to set up to extract only those articles that contain one of the key words. Specifically, which operator would work best? I tried „Filter Documents (by content)“ but I don’t understand where I should integrate my keywords.
Thank you so much!
Best,
Carl