🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

question marks in linear regression output

AD2019User: "AD2019"
New Altair Community Member
Updated by Jocelyn
I ran a linear regression model with 18 independent variables and feature selection turned off.  For some of the independent variables there were question marks for the standard error of the estimate, and therefore for the t-statistic and p-value for the coefficient.  I ran the mode again with feature selection turned on and got the same question marks.  What do these question marks mean?  Thay cannot have anything to do with missing values as the regression would not have run to completion in that case.  I am baffled about what these "?" symbols might mean.  Help..... 

Find more posts tagged with

Sort by:
1 - 12 of 121
    Can you post your process xml?  Do you have the bias parameter checked in the LR operator or the exclude collinear features?  There are several options that can affect the output.

    AD2019User: "AD2019"
    New Altair Community Member
    OP
    Hi, I have attached my process rmp file.  the 'exclude collinear features' is unchecked.  and you are correct about the bias thing.  if 'use bias' is checked, i do not get question marks.  if it is unchecked, i do get question marks.  I did all this with 'feature selection' turned off.  Something else is also strange.  I then turned on feature selection and used T_Test as the selection method with alpha set to 0.05.  I got a solution that included Independent variables with p-value much much higher than 0.05.  I am confused why these IVs were not trimmed from the output. thanks in advance for your help.
    AD2019User: "AD2019"
    New Altair Community Member
    OP
    by the way, regardless of the cause, I would like to know what the question mark in the regression output is trying to communicate to the user.  does it mean a computational underflow or overflow or a computational error or what?
    hi @AD2019 I'm picking up this thread here. I have your process (thank you) but not the data set - hence I cannot run the process. Can you pls post?
    AD2019User: "AD2019"
    New Altair Community Member
    OP
    my apologies for this delay in posting the data file.  please see attached.  when i run the regression without bias, I get question marks in the regression model.  What does that mean? the process files was posted earlier (RM-houseprice-process.rmp).  
    hi @AD2019 do you mean these ? marks?



    So the simple answer is that ? marks are used in RapidMiner when values are missing. The better question is why are they missing...my educated guess here (pls correct me @varunm1 @mschmitz if my stats are wrong here) is that there can be no std coefficient or tolerance for an intercept of a LinReg model as it's a computed value. All of your actual data (the other attributes) have std coefficients which make sense. But my stats are a wee bit rusty so I look to these other smart folks to correct me. :wink:

    Scott

    AD2019User: "AD2019"
    New Altair Community Member
    OP
    Hi Scott:
    if you run the process with bias turned off, you will get questions marks for some of the independent variables as well, not just the intercept.  Since there is a question mark on the standard error for these variables, the t-statistic and p-values also have question marks on them.  So it is not just an issue of the intercept.  The data set does not have missing values, so I could not figure out what the question marks were trying to say.  The only thing I could think of was numerical overflow or underflow when calculating the standard error of the associated variable, but then I could not see how the coefficients would have been computed.
    Amit
    hi Amit -

    Ah I understand. Good point. It's been a while since I've played with all of this (we normally use the GLM modeler instead of LinReg as it is far more versatile and robust). Let me investigate.

    Scott

    AD2019User: "AD2019"
    New Altair Community Member
    OP
    thanks Scott.  Let me play around with GLM and see if I can get rid of the ?
    varunm1User: "varunm1"
    New Altair Community Member
    Accepted Answer
    Updated by varunm1
    Hello @sgenzer and @AD2019

    I tried to look at H2O documentation on linear regression, unfortunately, I found none. For GLM to provide p-values, there is a mandatory parameter selection that H2O recommends to get values without "?" (Unknown)

    1. You should uncheck the " Use Regularization" option.
    2. You should select "Add intercept"
    3. You should select " compute p-values"
    4. You should select " remove collinear columns"

    If these are set then you will get the p values, std.error, etc without question marks. You will get question marks in this case only when the coefficient is 0.

    I will see if I can find any information on linear regression.
    sgenzerUser: "sgenzer"
    Altair Employee
    Accepted Answer
    thank you @varunm1!
    AD2019User: "AD2019"
    New Altair Community Member
    OP
    thank you Varun.