New Logistic Regression Operator: Strange Behavior

earmijo
earmijo New Altair Community Member
edited November 2024 in Community Q&A

Typically, in the absence of knowledge about the relative cost of missclassification errors a classifier shoud classify an observation as a member of the "True Class" if Probability(True) > 0.5. That's the behavior of most classifiers in Rapidminer (including W-Logistic). 

 

The new classifier "Logistic Regression" seems to be the exception. This classifier classifies an observation as True if Prob(True) > 0.3 (or in the Rapidminer terminology : if Confidence(True) > 0.3). I'm attaching a process showings this behavior. Just run it. Plot a histogram of Confidence(True) and color it using the variable Prediction(label).

 

The pic of the histogram is attached to this message too.

 

Best Answer

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    I tested that too, the other RapidMiner/Weka operators do operate as they should. Based on the H2O documentation, I think it's the F1 optimzation but will confirm.

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    In the sample process you attached, you use a deep learning operator inside the CV. Is this correct?

  • earmijo
    earmijo New Altair Community Member

    No. I used the new LogisticRegression operator. I didn't even use cross-validation. 

     

    The problem seems to be the GeneralizedLinearRegression routine. I exchanged operator (GLM for Logistic Regression) with the right settings (family=binomial, etc) and I get the same behavior.

  • earmijo
    earmijo New Altair Community Member
    .
  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    I see what you're saying. Hmm, let me investigate. 

  • Telcontar120
    Telcontar120 New Altair Community Member

    That's very curious.  Did you try comparing the results of the Weka version of the logistic regression operator?

     

     

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @yyhuang pointed out to me that it might be related to H2O's f1 optimization of binomal data sets for the GLM algo. http://ethen8181.github.io/machine-learning/h2o/h2o_glm/h2o_glm.html

     

    Will continue to investigate. 

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @Telcontar120 I tested this out using the Weka LR and the old Rapidminer SVM LR algo, both give me a label flip at confidence > 0.5 when using a Generate Data operator set to Random Classification.

     

    I think I'm learning toward the internal F1 measure optimization that H20 is doing behind the scenes for binomal labels, but we're looking into this. 

  • earmijo
    earmijo New Altair Community Member

    Thanks Thomas. I should add that if you use the Create Threshold and set it to 0.5 it works fine. 

     

    The operator W-Logistic works fine as do the other classifiers in Rapidminer. 

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    I tested that too, the other RapidMiner/Weka operators do operate as they should. Based on the H2O documentation, I think it's the F1 optimzation but will confirm.

  • earmijo
    earmijo New Altair Community Member

    Thomas:

    A quick entry to confirm that you were right. H2o chooses the predicted class based on the maximum-F1 threshold. From the User Guide (Generalized LInear Modeling with H2O and R) page 26.

    Screen Shot 2017-06-04 at 6.26.29 PM.png