Hi,
I'm trying to use the Windowing and Sliding Window Validator to predict future values. I've watched Thomas Ott YouTube video and looked at other posts in the forum, but I am still not confident using these operators so I'd like to ask some questions. I want to look at the settings at a very basic level to understand how to use them.
Let's say I have 1000 examples in my training set that covers 1000 days of a stock price. Is my understanding here correct?
First, The Windowing operator:
Window size: This is the number of days RapidMiner (RM) will use to predict the future value. If I set it to 10, RM will use 10 days of data to predict the future value. For example (let's not think about holidays and weekends), it will use Jan 1 -> Jan 10 to predict Jan 11.
Step Size: Decides which values to skip, or step over. If the step size is 7, RM will only use the values of Jan 1, 8, 15 etc. The skipped values will be left out and not used for predictions. It is the same as creating a new dataset with the first day of every week, setting step size to 1.
Create label: Here I choose the attribute I want to predict. I set it to "Yes" and chose the closing price attribute.
Here, we also have to set the horizon. Let's say my Window size is 10, step size is 1. If horizon is set to 1, RM will use the values of Jan 1 - Jan 10 to predict the value of Jan 11. If horizon is set to 5, RM will use the values of Jan 1 to Jan 10, to predict the value of Jan 15. Is that right?
Now on to The Sliding Window Validation operator.
Now, as far as I understand, the validator does not improve the model in itself. It is simply a tool to validate whether or not the model I have created is performing well. The results from the validator can be used to understand the model better and optimize it. Correct?
In the validator I find these settings.
Training Window Width
Training Window Step Size
Test Window Width
Horizon
Here, I am not quite sure what to do. Should these settings simply correspond to the settings in the Windowing operator? I believe this is not the right answer.
Following my previous examples, could we create similar examples for these settings to put it into context?