Tell k-NN (and possibly other models) to ignore training data dated past the Unlabeled record's time

The01Geek · May 2021

I have a large database of news records and their published timestamp. I'm currently experimenting with using k-NN to classify the company's stock behavior by comparing the news to similar cases that have occurred in the past. Naturally, I don't want the model to use any news that has been published AFTER the news-in-question as that would not be a realistic approach.

I'm wondering if there's a way to implement this in RM? Currently, I filter the data into "News before 2021-05-03" and "News published on 2021-05-03" and feed the two streams to the training and unlabeled streams respectively.

Image: https://us.v-cdn.net/6030995/uploads/editor/ck/wo8pu1orxrvj.png

As you can imagine, this is not a very efficient solution as it only gives me the performance results for one day. To get the performance results of 7 days, I'd have to adjust both filters 7 times, run the process and manually record the accuracy outcome.

I feel like there has got to be a better way to do this?

Thanks

BalazsBarany · May 2021

Hi,

your process is looking right. You are cleanly filtering training and validation data.

Familiarize yourself with loops and macros in RapidMiner. https://academy.rapidminer.com/catalog?query=loop

A loop on the 7 days you'd like to process will make your process do what it should.

Regards,
Balázs

The01Geek · May 2021

Thanks BalazsBarany.
I recently found the Sliding Window Validation operator.

Do you think this operator is going to address what I need, or should I create a custom loop?

BalazsBarany · May 2021

Hi @The01Geek,

if you just want to validate your prediction process, Sliding Window Validation is the way to go.

If you need a reusable process for future predictions, you'll have to build it manually.

Regards,
Balázs

Tell k-NN (and possibly other models) to ignore training data dated past the Unlabeled record's time

Answers

Categories