Hello everyone, as part of a university project I decided to experiment a bit with the data set I got and tried to input different aggregation levels of the data into auto model to compare the solutions.
At that point I was already a bit confused that my aggregated data often delivered better outputs than the divided one.
Since
the data is advancing through 27 weeks and every week more regular
attributes are added, I also tried to develop models for every week to
see when a model would be theoretical operational for a first
deployment.
I expected a slow increase in accuracy and gain
throughout the weeks but instead I got an extreme peak in week 7 with a
very high accuracy and a very good gain which then drastically declines
and is only surpassed by the best model in week 19. From week 19 on the
model decreases again but stays good until the predictions stops changing
from week 23-27.
My questions now are if such a behavior is
normal and why does it happen? If I look at the problem I can not really
think about a reason why more information would be harmful to a
prediction but it clearly seems to be the case. Furthermore, if the
prediction would theoretically be used, should I stop at the prediction
form week 19 or still use the model form week 27?
Sadly I am not allowed share the data.
Thanks for help in advance