forecast SVR
sarah_mi88
New Altair Community Member
Hi everyone,
I want to apply support vector regression with sales data for training from 2016-2017 and for testing from 2018 (label date). My aim is to see the forecast value for the next 4 periods. But operator "apply forecast" doesn't work and operator "Performance (Regression)" doesn't evaluate labels of type date. For parameters choosen see screenshot below. If any data are missing, pls comment. What do I have to do??
Thx and cheers,
Sarah
Tagged:
0
Best Answer
-
But how do I find "good" values for gamma, C, epsilon/nu and p? (nu-SVR or epsilon-SVR, I want to do regression). What is common practice? Doing CV? But how?We use "optimize parameters (Grid)" operators to search optimal hyperparameters for a model (SVM in this case). CV is only for validation purposes and doesn't provide any optimal parameters.
In your process, I see "Datum" (I think date) is set as a label and one more "Aufzugstechnik" is also set as a label. A prediction model can only take one label attribute, In your case, it should be "Aufzugstechnik" I guess.
Is your data set time-dependent (time series)? If so, regular cross-validation is not good as it fails in time series backtesting. You should choose the "Sliding window Validation" method.
Here is a link that helps you understand the time-series process in rapidminer
https://rapidminer.com/resource/time-series-analysis/
I attached a modified process, as I don't have your datasets, I did some modifications, you need to add windowing based on your dataset.
You can also see how to do "Optimize parameters" for SVM hyperparameters inside this sliding window validation operator.
2
Answers
-
Hello @sarah_mi88
It says that you set a column with date data type as label column. Did you set that? Can you provide .rmp file (File --> Export Process) and dara for us to debug?1 -
Hello @varunm1thanks for your help! I specified the label and get now the prediction values. But how do I find "good" values for gamma, C, epsilon/nu and p? (nu-SVR or epsilon-SVR, I want to do regression). What is common practice? Doing CV? But how? See .rmp file in attachment. Currently the prediction doesn't include trend, seasonality; the predicted value is the same for the whole test interval.
0 -
But how do I find "good" values for gamma, C, epsilon/nu and p? (nu-SVR or epsilon-SVR, I want to do regression). What is common practice? Doing CV? But how?We use "optimize parameters (Grid)" operators to search optimal hyperparameters for a model (SVM in this case). CV is only for validation purposes and doesn't provide any optimal parameters.
In your process, I see "Datum" (I think date) is set as a label and one more "Aufzugstechnik" is also set as a label. A prediction model can only take one label attribute, In your case, it should be "Aufzugstechnik" I guess.
Is your data set time-dependent (time series)? If so, regular cross-validation is not good as it fails in time series backtesting. You should choose the "Sliding window Validation" method.
Here is a link that helps you understand the time-series process in rapidminer
https://rapidminer.com/resource/time-series-analysis/
I attached a modified process, as I don't have your datasets, I did some modifications, you need to add windowing based on your dataset.
You can also see how to do "Optimize parameters" for SVM hyperparameters inside this sliding window validation operator.
2 -
Thank you so much. Really appreciating it. Besides I get this error. Can you help me with that too? (attached xlsx)
0 -
Hello @sarah_mi88
This error comes when your dataset has irregular information in the date column. For time series, you need to have a monotonically increasing date column (you should not have the same date and time repeating in your dataset).
Based on the dataset you gave (very small dataset). Please find the working process.2 -
Hello Varunok. Sorry for the stupid question but why is the value always the same ? (no trend, saisonality, same prediction for Q1-4)0
-
I observed that the model is doing worse. If you see the squared correlation value from the performance it is zero which means the model is not at all good. This may be due to fewer data in your dataset (7 examples is very small). Try simple models like GLM and see how it goes, you can also look at time series models like ARIMA.0