Multiple non linear Regression in Rapid miner
binay
New Altair Community Member
I am a newbie in rapid miner. I am using Rapid miner as a part of my data mining tool for my graduation thesis.
I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.
By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.
Y = a . C1 + b.e^C2 + c.log C3 + ...
Here, a, b, c are independent variables and C1, C2, C3 are coefficients.
Could anybody explain me how can I add such operators to achieve my goal?
Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.
By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.
Y = a . C1 + b.e^C2 + c.log C3 + ...
Here, a, b, c are independent variables and C1, C2, C3 are coefficients.
Could anybody explain me how can I add such operators to achieve my goal?
Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
Tagged:
0
Answers
-
My first try would be a neural net?
~Martin0 -
Thanks Martin for the answer.
But Neural net is something which is hidden to the user and also requires large number of inputs. That is why I am considering regressions, in which the regression formula is visible and clear to the user. And the user can easily relate how the dependent variables are a function of linear, exponential, logarithmic or polynomial function of independent variables.
So, are there any operators in Rapidminer to get such kind of formulas for regressions? Or if there is any way to deal with such problem?
I might use neural networks and other techniques as well to validate my predictions though.
Thanks!0 -
It sounds like you are describing an SVM?
Each variable gets a formula which transforms the space around it so it becomes linear.
Try one alongside the Create Formula operator.0 -
Thanks Edward, for your valuable suggestion.
I implemented your approach and it did produce a very complex formula of like 20-30 terms for 5 independent variables.
But the worst part was that, the performance was not very promising for my data.
I am developing a parametric cost model, in which the cost is dependent on a number of independent variables. So, the final formula would contain various Cost Estimating Relationship formulas combined together to predict the cost. I know that this is a multiple non-linear regression problem, but I do not know how to implement this even with other tools or with rapidminer.
Any further help to this direction, would be appreciated.0 -
@binay Hello, I wonder if you have figured out how to do nonlinear multiple regression? If so, I'd appreciate if you can kindly share the process! Thanks!!0
-
hi @joen841030 hmm this is an OLD thread; I'm not sure user binay is going to pick this up (although who knows? ).
Anyway all the regression operators including linear (GLM), polynomial, etc.. can all be found by simply typing "regression" in the operator search window:
Is there a particular reason you want to use nonlinear regression models? What is your use case? Have you tried just using Auto Model and see what happens there first as a quick test?
Scott2 -
@sgenzer
Thanks so much for the reply! I have 1 dependent variable (engagement rate) and 12 independent variables (color of the picture) all measured at continuous level. I tried SPSS first with linear regression but didn't really work because the data should be non-linear based on the graph. That's why now I am trying out nonlinear.But I am actually not sure which function exactly I should use for my case...Thanks!1 -
Hi @joen841030,
I highly recommend to follow Scott's advice to submit your data to AutoModel.
More over , AutoModel can perform feature selection (and eventually feature generation) automatically for you.
Your dataset must contain at least 100 rows.
Regards,
Lionel3 -
Thanks @lionelderkrikor
I have just tried it out! The generalized linear model appeared to perform the best though. However, I wonder is there any reason that there is no p-value etc showing?
0 -
Hello @joen841030
To get the p-values, please uncheck the "use regularization" option in GLM parameters and check the "compute p-values" in the parameters. I also suggest checking the "remove collinear columns" option as well. This way you will get the p-values.
Please let us know if you encounter any issues.1 -
@varunm1 Thanks for the comment! However, the results of GLM shows "error"... It shows "Error while training the H2O model: Found collinear columns in the dataset. P-values can not be computed with collinear columns in the dataset. Set remove_collinear_columns flag to true to remove collinear columns automatically. "
I wanted to check the "remove collinear columns" as per your suggestion, but I couldn't find that option? Where is that? Thank you very much in advance!!!.0 -
Hello @joen841030
Looks like you didn't check "add intercept". First, check the "add intercept" then you can find "remove collinear columns".
Let us know any other issues you face.2 -
@varunm1 thank you for your comments earlier, extremely helpful!
I've decided to use the results from SVM eventually but I am not sure exactly how to interpret those numbers ... for example, some of the weight of the attribute shows 0, meaning that they do not contribute to my DV at all? And there are several other outputs under SVM that I am not sure how to interpret it. I couldn't find SVM in the Auto Model ducumentation on rapidminer website. It would be nice if you have some information regarding the SVM results generated by Auto Model!1