How can I remove heteroskedasticity from a multiple regression in context of forecasting ?
florianherrmann
New Altair Community Member
Hey guys, this is Florian writing,
I`m currently facing an issue regarding a multiple regression where I´m pretty much stuck. The context of modelling is a multivariate forecast.
Long story short:
I have done a residual analysis for the multiple regression, as the the squarred correlation and forecast results itself indicated a poor job of the prediction. The inference of the residual analysis has been heteroskedasticity and not randomly distributed resuduals.
After some research I have figured out that the systematic lack of fit and heteroskedasticity can be solved by transforming variables (e.g. box cox transformation). Unfortunteatly RapidMiner doesn´t provide the box cox transformation. As a result I´m stuck with my research and in need for some expert knowledge.
Is there any other way to solve heteroskedasticity and system lack of fit within RapidMiner without completly restructuring my modell?
Appreciate your help guys!
I`m currently facing an issue regarding a multiple regression where I´m pretty much stuck. The context of modelling is a multivariate forecast.
Long story short:
I have done a residual analysis for the multiple regression, as the the squarred correlation and forecast results itself indicated a poor job of the prediction. The inference of the residual analysis has been heteroskedasticity and not randomly distributed resuduals.
After some research I have figured out that the systematic lack of fit and heteroskedasticity can be solved by transforming variables (e.g. box cox transformation). Unfortunteatly RapidMiner doesn´t provide the box cox transformation. As a result I´m stuck with my research and in need for some expert knowledge.
Is there any other way to solve heteroskedasticity and system lack of fit within RapidMiner without completly restructuring my modell?
Appreciate your help guys!
0
Answers
-
Hello @florianherrmann
Did you check if this phenomenon is caused by outliers? If you have outliers then they have to be taken care of first as they will make residuals look like this.
If you have no outliers then I think one way to implement it by using execute python operator in rapidminer and then applying power transformation in scikit learn. I don't think RM has box cox yet.0 -
Hi @florianherrmann,
Unfortunately the Box-Cox transformation is not (yet) added to RapidMiner. We have it on the roadmap nevertheless.
For now, I just have two ideas:
- You can include the box-box transformation from python (or R) by using the Python (R) extension, which allows to integrate python scripts into your workflow
- You can also try to smooth your data beforehand, this may help as well. For example by either using the Exponential Smoothing or the Moving Average Filter (I would recommend the binom filter here)
Hopes this helps and best wishes with your research
Fabian0 -
Thanks @varunm1 and @tftemme for your help !
@tftemme what I have figured out so far, is that my time series prediction is always lagging by +1 step in time series in comparison to the label. As there are two pretty high peaks, due to seasonal patterns, this might be the reason for the heteroskedascitiy.
For the predicition I use either some external attributes and a lagged value (-1) of the label attribute itself. Have you got an idea what might be a solution to remove the lagging prediction?
Appreciate your help
Florian0 -
Probably your model is just using the last value (the lagged value) as the prediction for the label, so it will be always lagging one step behind. Why not using more than one lagged value for the attributes? You can use the Windowing operator to achieve this.
Keep in mind that it is possible that there is no pattern in your data to predict the future, so just using the last value is maybe the best guess for the prediction and you may not get a better prediction.0 -
If you have seasonality in your data, it is maybe also worthwhile to have a look into the univariate forecast methods of Holt-Winters and the "Function and Seasonal Component Forecast". Both methods are trying to include seasonal aspects in the data. Keep in mind that you have to adapt your process for this (using Forecast Validation operator for example).0
-
Thanks @tftemme !
That is what my research is actually about. Trying to figure out if a multivariate Forecast based on linear regression is as least as good as a univariate time series prediction ( in my case the Funcional and Seasonal Component Forecast).
I have now elaborated your advices in the model.
Highly appreciated your help:)
Kind regards
Florian0