Which Confusion matrix is better?

NatalySimth
NatalySimth New Altair Community Member
edited November 5 in Community Q&A
I got the following results for two model:

Accuracy Model1
92.22%
/ Model 2 96.95%
Recall
94.20%
/94.20%
Precision
64.33%
/84.74%
F1-Score
76.45%
/89.21%

And I got the lift chart as well as attached. How can I classify my result based on the lift chart and my calculation? Knowing that the term is in identifying spam messages.
1.JPG 112.4K
2.JPG 108K
Tagged:

Best Answer

  • IngoRM
    IngoRM New Altair Community Member
    edited November 2019 Answer ✓
    Ok, a couple of things:
    1. Without knowing anything else, Model 2 is more likely to produce better predictions (with "better" meaning having more impact).  This is based on a) the higher accuracy and b) the higher Precision with the same Recall which c) also results in a higher F1-Score.
    2. However, the impact of both models may be the same or possibly Model 1 has even bigger business impact.  What is the cost of missing an important email because it was falsely classified as spam?  What is the cost of spam mails which make it through the filter?  Based on those values, you could (and should) actually determine the most important thing: what is the impact of the model?  Which one has more?  And is it any better than not doing anything and treat everything as "no spam"?  You would be surprised how often models even with low errors rates actually are not performing better than not using the model in general...
    3. Your lift charts look strange TBH.  You could try to use Lift Chart (Simple) which has been introduced a while ago and see if they look any better.  Otherwise those charts look like pretty much all confidence values are either 0 and 1 which often happens for text classification and models like NB etc.
    Hope those pointers help...
    Cheers,
    Ingo

Answers

  • IngoRM
    IngoRM New Altair Community Member
    edited November 2019 Answer ✓
    Ok, a couple of things:
    1. Without knowing anything else, Model 2 is more likely to produce better predictions (with "better" meaning having more impact).  This is based on a) the higher accuracy and b) the higher Precision with the same Recall which c) also results in a higher F1-Score.
    2. However, the impact of both models may be the same or possibly Model 1 has even bigger business impact.  What is the cost of missing an important email because it was falsely classified as spam?  What is the cost of spam mails which make it through the filter?  Based on those values, you could (and should) actually determine the most important thing: what is the impact of the model?  Which one has more?  And is it any better than not doing anything and treat everything as "no spam"?  You would be surprised how often models even with low errors rates actually are not performing better than not using the model in general...
    3. Your lift charts look strange TBH.  You could try to use Lift Chart (Simple) which has been introduced a while ago and see if they look any better.  Otherwise those charts look like pretty much all confidence values are either 0 and 1 which often happens for text classification and models like NB etc.
    Hope those pointers help...
    Cheers,
    Ingo