Wyh does rapidminer include a variable with a p-value >0,05 in a multiple linear regression?

MariJAM
MariJAM New Altair Community Member
edited November 5 in Community Q&A
Hello,

I'm doing a multiple linear regression. For my regression I have choosen the M5 prime feature with a min tolerance of 0,05. The final model contains three independent variables. Two of them have a p-value under 0,05 and one is above with a p-value of 0,135 (and t-Stat of 1,543).
Two other independent variables have not been included in the model due to their high p-values und low t-Stat values. 

Can anyone help and tell me why rapid miner includes this one variable eventhough its p-value is above 0,05?

Thanks a lot!

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hey,

    you are coming from a stats background, while RM is more from a DS background. There are quite some assumptions behind the p-value calculation. The mindset of DS is more: If i can prove that this method works better than another one, i take the method. So what you would do is vary the cutoff and check the results.

    Best,
    Martin

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hey,

    you are coming from a stats background, while RM is more from a DS background. There are quite some assumptions behind the p-value calculation. The mindset of DS is more: If i can prove that this method works better than another one, i take the method. So what you would do is vary the cutoff and check the results.

    Best,
    Martin