Accuracy of models all the same

AizatAlam_129
AizatAlam_129 New Altair Community Member
edited November 5 in Community Q&A
Hi,

I ran my data on RM's automodel for prediction and the results showed that all models have the same accuracy rate.

I have no idea why this happened. Can anyone explain to me what can be the possibilities that it came to be like this?

Thank you

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Having imbalanced data is not necessarily a problem. There are some methods for coping with it.

    Try balancing the data before running AutoModel. That won't give you a perfect model you can deploy but you get an estimate of the model quality on the balanced data, on the importance of attributes and on which algorithm works best on your data. You should get more complex models and more reasonable confusion matrixes from this approach, even if the accuracy might be lower than before.

    Here's an Academy video on balancing, sampling and weighting data. These are approaches you can try for creating a good model on imbalanced data:
    https://academy.rapidminer.com/learn/video/sampling-weighting-intro

    So, I would do the following:
    1. Downsampling the majority class to be more or less equal to the minority class.
    2. Running AutoModel on the balanced data.
    3. Choosing a model type for further work.
    4. Building a process with an approach for weighting or sampling, e. g. downsampling in the left part of a cross validation.
    5. Validating and optimizing the final model.

    It is important that you validate your models on the original (imbalanced) distribution even when using some sampling method to build better models.

    Regards,
    Balázs

Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Hi!

    In my experience this happens when the data set is imbalanced and complex to predict. In that case, for each modeling algorithm, predicting the majority class will be the best choice, so they will just do that. But there could be other possibilities, too.

    Are the accuracy rates, AUC values and confusion matrixes all the same? You can easily see the "all models predict the majority class" in the confusion matrix.

    Can you take a look at the actual models, e. g. decision tree, GBT, Random Forest? They are easy to interpret. If the trees are just simple two-way decisions and not trees, then that's the reason. 

    Regards,
    Balázs
  • AizatAlam_129
    AizatAlam_129 New Altair Community Member
    @BalazsBarany you are absolutely right! the dataset is indeed imbalanced (although not sure about the prediction complexity though). The accuracy rates are all the same but not the AUC of each model.

    And upon checking DT, GBT and RF, oddly they are indeed just simple two-way decisions.

    Does this mean that I have a problem with my data and that the models are incorrect? 
  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Having imbalanced data is not necessarily a problem. There are some methods for coping with it.

    Try balancing the data before running AutoModel. That won't give you a perfect model you can deploy but you get an estimate of the model quality on the balanced data, on the importance of attributes and on which algorithm works best on your data. You should get more complex models and more reasonable confusion matrixes from this approach, even if the accuracy might be lower than before.

    Here's an Academy video on balancing, sampling and weighting data. These are approaches you can try for creating a good model on imbalanced data:
    https://academy.rapidminer.com/learn/video/sampling-weighting-intro

    So, I would do the following:
    1. Downsampling the majority class to be more or less equal to the minority class.
    2. Running AutoModel on the balanced data.
    3. Choosing a model type for further work.
    4. Building a process with an approach for weighting or sampling, e. g. downsampling in the left part of a cross validation.
    5. Validating and optimizing the final model.

    It is important that you validate your models on the original (imbalanced) distribution even when using some sampling method to build better models.

    Regards,
    Balázs