Linear Regression: error in calculation of tolerance

User: "dhampton"
New Altair Community Member
Updated by Jocelyn

I am writing training materials for multiple regression.  The Linear Regression Operator is giving what seems to be incorrect calculations for tolerance.

 

 To illustrate, see attached toy dataset. My process reads this data and uses Linear Regression to do y=f(x1, x2, x3, x4). The model is then applied to the training data (just to keep things simple) and finally I use Performance to get R-squared. The result is:

 

Attribute     Coefficient                    Standard Error            Std. Coefficient             Tolerance                 t-stat                          p-value                     code

X1 0.6099442233747938 0.097076731571145 0.8324180612316422 0.4913830335394965 6.283114537367604 1.4384283423596322E-4 ****
X2 -2.8474043342377822E-8 1.9598479705266512E-7 -0.028568714232080603 0.40108726248304105 0.0 1.0  
X3 0.178312419929975 0.0821213306746008 0.7990271382036194 0.4534020133333492 2.1713289161925995 0.05798784094691456 *
X4 -0.0010830494516547503 7.82512989580685E-4 -0.49206399607097406 0.262094151203384 -1.3840657804736376 0.19969313341637596  
(Intercept) -0.3277299280807463 0.161204140113176 NaN NaN -2.033011855965102 0.07258034063737584 *

 

I cross check the results with Minitab and RapidMiner and Minitab agree on everything except tolerance.  Minitab reports VIFs but they are simply the reciprocal of tolerance.  Here is the Minitab output

Term            Coef          SE Coef        T-Value       P-Value      VIF
Constant     -0.328        0.161             -2.03          0.073
x1               0.6099        0.0971           6.28           0.000         2.53
x2               -0.000000   0.000000       -0.15         0.888         5.58
x3               0.1783        0.0821           2.17           0.058       19.54
x4               -0.001083  0.000783       -1.38           0.200      18.24

 

The VIFs are a long way from the reciprocals of the tolerances.

 

I calculated the values directly: tolerance = 1-R-sq, where R-sq is obtained by regressing the x against all the other xs.  So for example if I drop the y and make x4 the label and re-run the process, I get an R-sq of 94.5% and the tolerance for x4 should therefore be 0.055, not 0.262

 

Am I going wrong, or is it an error?

 

Many thanks

 

David Hampton

Find more posts tagged with