Validating a linear regression

Asoka
Asoka New Altair Community Member
edited November 5 in Community Q&A
I figure the capabilities I'm looking for must be available - I just haven't been able to find them.

When generating a Linear Regression in RapidMiner v5 (.008 - the upgrade to .015 isn't working for me), I am trying to figure out how to get the various measures and plots that are used to validate the various assumptions of a Linear Regression.  With the standard output of the Linear Regression operator, I can find the R Square and T-test results for the individual variables.  I can use the T-test results to imply the model level F-test.

Additional information I am looking for are things like the Adjusted R-Square, plot of errors, QQ plot, Variance Inflation Factor, Cooke's distance, and that sort of thing.  I originally learned validation of linear regression using PROC REG from SAS if that helps frame the sort of information I'm looking for.

I figure these tests and plots have to be available in Rapid Miner - any hints or pointers to where I can get that info is greatly appreciated.

Thanks

Answers

  • Asoka
    Asoka New Altair Community Member
    Bumping to give this another chance - am I truly limited to the t-test for validating a Linear Regression within Rapid Miner?

    Thanks!
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi,

    usually we use a X-Validation to validate the Linear Regression  - the same way as we do with all supervised learning algorithms.

    Basically the X-Validation splits the data numerous times into test and training set, calculates the linear regression model on the training set, applies it on the test set and calculates a performance measure.
    By using the operator Performance (Regression) you have a big choice of measures to calculate.

    Best regards,
    Marius
  • Asoka
    Asoka New Altair Community Member
    That much makes sense Marius - I'll set that up and see how close I can get to what I'm looking for.  At the very least, I'll be able to be more precise about what I'm finding or not finding.  Setting up the validation and performance(regression) operators makes perfect sense.

    Thanks