Cannot execute log reg calibration learning: Error while training the H2O model: Illegal argument(s)

SabaMomeniKho
SabaMomeniKho New Altair Community Member
edited November 2024 in Community Q&A
Hello,
I'm using auto model in rapidminer 9.5 for a crash dataset. the task is prediction and the "class" column is the target. I chose decision tree, naive bayes, gradient boosted trees, random forest, svm and deep learning. After running, the process only shows results for naive bayes and decision trees and the others face the error below:
       Cannot execute log reg calibration learning: Error while training the H2O model: Illegal argument(s) for GLM model: ERRR on field: _response: Response cannot be constant.
As I'm new to this software and I should use it for my Msc thesis, I really need help with this problem. I have also attached my data in case you needed to see.
Thank you. 

Tagged:

Best Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    edited December 2019 Answer ✓
    Hi @SabaMomeniKho,

    It's a known issue from the RM staff. It is due to the fact that your label has (very) minority classes : 



    There are 2 workarounds : 
     - First try to group your 2 minority classes ("majorinjury" and "fatal") in a unique class (called for example "other injuries"). You can do that with the Replace Rare Values operator which is part of the Toolbox extension (to install from the marketPlace).
     - if it does not work, filter out this minority classes from your dataset.

    Hope this helps,

    Regards,

    Lionel
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    edited December 2019 Answer ✓
    @SabaMomeniKho,

    OK, I understand.
    Yes, good idea : You can apply these predictors separately in the design view.
    With your highly imbalanced dataset, I think you can present 2 strategies : 

     1.  No data preprocessing :

    Please look at the Process_1.rmp in attached file and its results.
    Given you have very few examples of your minority classes (minorinjury, majorinjury, fatal), without data preprocessing, the used algorithm(s) have difficulties to establish / to "captur" the relationships between your regular attributes and these minority classes of your label. As the results, you have effectively a relativ good accuracy, because your algorithm(s) are predicting (quasi) only the majority class (in your case "pdo"). But the cons of this strategy is that the recall of your minority class are extremely bad (very close to 0 or 0), that is to say that the capacity of your model to correctly predict the minority classes is very bad : 

       

    2. Data preprocessing : 

    Please look at the Process_2.rmp in attached file and its results.
    If your priority goal is to correctly predict one of your 3 minority classes deservedly (contribute to better road safety is a noble task, congratulations !  o:) ), you have to upsample the minority class you want to correctly predict, meaning that you have to "artificially increase" the number of observations of this minority class. For that you can use the SMOTE Upsampling operator (part of Toolbox Extension to install from the MarketPlace). In the parameters of this operator, uncheck auto detect minority class and set the name of the minority class you want to predict, for example "fatal".


    As the results, the class recall of the studied minority class is significantly than in the first strategy; meaning that your model is now able to correctly predict one of your minority class (for example "fatal"). The cons of this strategy is that your overall accuracy will decrease : 




    Next steps : 

    To enhance the performance of your model(s) , you can introduce the concepts of : 
     -  Parameters optimization (via the Optimize Parameters (Grid) operator)
     - Feature selection (via the  Automatic Feature Engineering / Apply Feature Set operators)
    To help you with these concepts, you can go to the RapidMiner Academy where there are plenty pedagogic videos : 
    https://academy.rapidminer.com/

    Don't hesitate to comeback if you have other questions during your thesis...

    Regards,

    Lionel

    PS : For my general culture, what is the meaning of "pdo" (the majority class of your label). Thanks you...
      

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    edited December 2019 Answer ✓
    Hi @SabaMomeniKho,

    It's a known issue from the RM staff. It is due to the fact that your label has (very) minority classes : 



    There are 2 workarounds : 
     - First try to group your 2 minority classes ("majorinjury" and "fatal") in a unique class (called for example "other injuries"). You can do that with the Replace Rare Values operator which is part of the Toolbox extension (to install from the marketPlace).
     - if it does not work, filter out this minority classes from your dataset.

    Hope this helps,

    Regards,

    Lionel
  • SabaMomeniKho
    SabaMomeniKho New Altair Community Member
    hi @lionelderkrikor
    thanks for helping:)
    actually this is a real data, related to 2018 roadway crashesh in Iowa, usa and as you saw, there are few crashes that have led to fatality or minor injury! and for the whole process in my thesis, I need each four classes. 
    what do you think about applying these predictors separately in design view and then comparing the results?!
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    edited December 2019 Answer ✓
    @SabaMomeniKho,

    OK, I understand.
    Yes, good idea : You can apply these predictors separately in the design view.
    With your highly imbalanced dataset, I think you can present 2 strategies : 

     1.  No data preprocessing :

    Please look at the Process_1.rmp in attached file and its results.
    Given you have very few examples of your minority classes (minorinjury, majorinjury, fatal), without data preprocessing, the used algorithm(s) have difficulties to establish / to "captur" the relationships between your regular attributes and these minority classes of your label. As the results, you have effectively a relativ good accuracy, because your algorithm(s) are predicting (quasi) only the majority class (in your case "pdo"). But the cons of this strategy is that the recall of your minority class are extremely bad (very close to 0 or 0), that is to say that the capacity of your model to correctly predict the minority classes is very bad : 

       

    2. Data preprocessing : 

    Please look at the Process_2.rmp in attached file and its results.
    If your priority goal is to correctly predict one of your 3 minority classes deservedly (contribute to better road safety is a noble task, congratulations !  o:) ), you have to upsample the minority class you want to correctly predict, meaning that you have to "artificially increase" the number of observations of this minority class. For that you can use the SMOTE Upsampling operator (part of Toolbox Extension to install from the MarketPlace). In the parameters of this operator, uncheck auto detect minority class and set the name of the minority class you want to predict, for example "fatal".


    As the results, the class recall of the studied minority class is significantly than in the first strategy; meaning that your model is now able to correctly predict one of your minority class (for example "fatal"). The cons of this strategy is that your overall accuracy will decrease : 




    Next steps : 

    To enhance the performance of your model(s) , you can introduce the concepts of : 
     -  Parameters optimization (via the Optimize Parameters (Grid) operator)
     - Feature selection (via the  Automatic Feature Engineering / Apply Feature Set operators)
    To help you with these concepts, you can go to the RapidMiner Academy where there are plenty pedagogic videos : 
    https://academy.rapidminer.com/

    Don't hesitate to comeback if you have other questions during your thesis...

    Regards,

    Lionel

    PS : For my general culture, what is the meaning of "pdo" (the majority class of your label). Thanks you...
      

  • varunm1
    varunm1 New Altair Community Member
    edited December 2019
    @lionelderkrikor its Property Damage Only (when there is no bodily injury involved in crash).
  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    @lionelderkrikor its Property Damage Only (when there is no bodily injury involved in crash).
    OK, thanks Varun !!

    Regards,

    Lionel