New Logistic Regression Operator: Strange Behavior

earmijo
earmijo New Altair Community Member
edited November 2024 in Community Q&A

Typically, in the absence of knowledge about the relative cost of missclassification errors a classifier shoud classify an observation as a member of the "True Class" if Probability(True) > 0.5. That's the behavior of most classifiers in Rapidminer (including W-Logistic). 

 

The new classifier "Logistic Regression" seems to be the exception. This classifier classifies an observation as True if Prob(True) > 0.3 (or in the Rapidminer terminology : if Confidence(True) > 0.3). I'm attaching a process showings this behavior. Just run it. Plot a histogram of Confidence(True) and color it using the variable Prediction(label).

 

The pic of the histogram is attached to this message too.

 

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    I tested that too, the other RapidMiner/Weka operators do operate as they should. Based on the H2O documentation, I think it's the F1 optimzation but will confirm.

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    In the sample process you attached, you use a deep learning operator inside the CV. Is this correct?

  • earmijo
    earmijo New Altair Community Member

    No. I used the new LogisticRegression operator. I didn't even use cross-validation. 

     

    The problem seems to be the GeneralizedLinearRegression routine. I exchanged operator (GLM for Logistic Regression) with the right settings (family=binomial, etc) and I get the same behavior.

  • earmijo
    earmijo New Altair Community Member
    .
  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    I see what you're saying. Hmm, let me investigate. 

  • Telcontar120
    Telcontar120 New Altair Community Member

    That's very curious.  Did you try comparing the results of the Weka version of the logistic regression operator?

     

     

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @yyhuang pointed out to me that it might be related to H2O's f1 optimization of binomal data sets for the GLM algo. http://ethen8181.github.io/machine-learning/h2o/h2o_glm/h2o_glm.html

     

    Will continue to investigate. 

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @Telcontar120 I tested this out using the Weka LR and the old Rapidminer SVM LR algo, both give me a label flip at confidence > 0.5 when using a Generate Data operator set to Random Classification.

     

    I think I'm learning toward the internal F1 measure optimization that H20 is doing behind the scenes for binomal labels, but we're looking into this. 

  • earmijo
    earmijo New Altair Community Member

    Thanks Thomas. I should add that if you use the Create Threshold and set it to 0.5 it works fine. 

     

    The operator W-Logistic works fine as do the other classifiers in Rapidminer. 

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    I tested that too, the other RapidMiner/Weka operators do operate as they should. Based on the H2O documentation, I think it's the F1 optimzation but will confirm.

  • earmijo
    earmijo New Altair Community Member

    Thomas:

    A quick entry to confirm that you were right. H2o chooses the predicted class based on the maximum-F1 threshold. From the User Guide (Generalized LInear Modeling with H2O and R) page 26.

    Screen Shot 2017-06-04 at 6.26.29 PM.png

     

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.