Hi Community!
I am in the process of contemplating how I could build a model I have in mind in RapidMiner. I have listed below all the details I have in mind and would appreciate if some of you experienced RapidMiners out there can guide me in building this.
Background:
- I have 11 features in my
dataset that will be used to construct a forecast value for the next day of the
target variable (target variable is temperature in this case). I have 15 years
of daily data where each day records the temperature of that day and the 11
features (in one row). So, I have a total of about 5,475 observations (i.e.: 15
years * 365 days).
- I know how I want the model
to work but I am not really sure if this can be built on RapidMiner.
The thought flow works as
follows:
- For each day in the
dataset, I want to find and collect the 50 most similar days for the 11 features
in the dataset. (I assume I would need a kNN algo here or maybe cross distances
or data-to-similarity operator?).
- In the 50 most similar dataset,
I want to then see what the temperature observed was on the NEXT day for these
50 days and compute the average temperature of the NEXT day for all of these 50
days. This is the “expected temperature” which can be used as a forecast to
have an indication of the expected temperature tomorrow.
- I want to start making forecasts after the first 2000 observations (this is the initial minimum window size). The window size is expanding as we reach 5,475 observations (which is today).
I hope this makes sense and thanks
in advance for your support!
Thanks & Regards,
Faycal