"Sliding Window Validation - What Model?"
B_Miner
New Altair Community Member
Hi All,
I will admit I am perplexed by the sliding window validation process (what it does and the parameters). In trying to understand it, the first question is what model is actually fit at the end? Is it the one using the most recent records (with the number of said records depending on the settings in the operator)?
I will admit I am perplexed by the sliding window validation process (what it does and the parameters). In trying to understand it, the first question is what model is actually fit at the end? Is it the one using the most recent records (with the number of said records depending on the settings in the operator)?
Tagged:
0
Answers
-
Hi,
do you mean, on which data the model is fitted that will be delivered at the mod port?
With kind regards,
Sebastian Land0 -
Hi Sebastian,
Yes, that is what I mean. What is that final model - is it fit using the last k records, where k is set in the parameters as the window?0 -
Hi All,
I had the same question and couldn't find an answer.
What model is delivered at the mod port? If a model is returned, what is it's value for future data?
My understanding is that a new model is created and tested for each window. What we are really validating is how well the process of learning a model works, right? Thus, no single model returned at the port will be of value.
I'm clearly confused. Please help.
Thank you0 -
Exactly.dcubed wrote:
My understanding is that a new model is created and tested for each window. What we are really validating is how well the process of learning a model works, right?
That is wrong: if anything is connected to the model output of the validation, after the validation process as described above a model is created on the complete data and returned at the model output port.Thus, no single model returned at the port will be of value.
Best, Marius0 -
The model thus returned is therefore different from all prior models in that the data used to train it is all the data in the data set not just the data in any of the prior training windows?
That is wrong: if anything is connected to the model output of the validation, after the validation process as described above a model is created on the complete data and returned at the model output port.
Stated differently, if I have 1000 rows with a training window of 50 validated on the next row, I will have gone through 949 models each with 50 rows of data for training. The model returned, however, will be trained on 1000 rows?
If the reason I am training on 50 rows to predict the next is because the process generating the rows is not stationary, does it not follow that the final model trained on 1000 rows will be of little value in predicting the 1001 row?0 -
Hi Dcubed,
I remember having exactly this exchange with Ingo a year or so ago right here; I was using SVMs to make short term forecasts in foreign exchange markets, and optimised the look-back and prediction horizon sizes in a sliding window validation. The performance figures were fine, as you would expect, but I had to store the model at every iteration within the validation , just to get the last one. Yes, wasteful of course, yes easily fixable, that's the wonder of open source!
What I, like you, never worked out was the correct scenario for using a model built on all the examples of a concept drift.
Happy days!0