Tell k-NN (and possibly other models) to ignore training data dated past the Unlabeled record's time
The01Geek
New Altair Community Member
I have a large database of news records and their published timestamp. I'm currently experimenting with using k-NN to classify the company's stock behavior by comparing the news to similar cases that have occurred in the past. Naturally, I don't want the model to use any news that has been published AFTER the news-in-question as that would not be a realistic approach.
I'm wondering if there's a way to implement this in RM? Currently, I filter the data into "News before 2021-05-03" and "News published on 2021-05-03" and feed the two streams to the training and unlabeled streams respectively.
As you can imagine, this is not a very efficient solution as it only gives me the performance results for one day. To get the performance results of 7 days, I'd have to adjust both filters 7 times, run the process and manually record the accuracy outcome.
I feel like there has got to be a better way to do this?
Thanks
I'm wondering if there's a way to implement this in RM? Currently, I filter the data into "News before 2021-05-03" and "News published on 2021-05-03" and feed the two streams to the training and unlabeled streams respectively.
As you can imagine, this is not a very efficient solution as it only gives me the performance results for one day. To get the performance results of 7 days, I'd have to adjust both filters 7 times, run the process and manually record the accuracy outcome.
I feel like there has got to be a better way to do this?
Thanks
0
Answers
-
Hi,
your process is looking right. You are cleanly filtering training and validation data.
Familiarize yourself with loops and macros in RapidMiner. https://academy.rapidminer.com/catalog?query=loop
A loop on the 7 days you'd like to process will make your process do what it should.
Regards,
Balázs1 -
Thanks BalazsBarany.
I recently found the Sliding Window Validation operator.
Do you think this operator is going to address what I need, or should I create a custom loop?0 -
Hi @The01Geek,
if you just want to validate your prediction process, Sliding Window Validation is the way to go.
If you need a reusable process for future predictions, you'll have to build it manually.
Regards,
Balázs0