Is accuracy enough for determining model performance....?

yogafire
yogafire New Altair Community Member
edited November 5 in Community Q&A
hello

I have 3 predictive models... those are
1. Backpropagation-Based
2. C4.5-Based
3. CN2-Based.

I use accuracy for predictive model performance measurement... and these were their result :
1. Backpropagation ==> 83.14% on Training, 85.12 on Testing
2. C4.5 ==> 83.72 % on Training, 84,04% on Testing.
3. CN2 ==> 82.98 % on Training, 84,65 % on Testing.

when I look at the percentage of accuracy of each algorithm, It means that there is no significant difference between one and the others. My question is that accuracy enough to determine or judge the performance of certain algorithm in certain case? if that so, I just wonder which is the best model among that three model, because... you know, there are no real significant difference between them  ::)... ??? (because it could be only 1 or 2 correctly classified vector...)

thank you for your advice,

regards
Dimas Yogatama...

Answers

  • Depends on your DM problem... but no, accuracy is never enough. If you are trying to predict a 15%-85% distribution class, your 85% accuracy means absolutely nothing on its own.

    Try some roc or lift charts... but tell us more about your problem...
  • yogafire
    yogafire New Altair Community Member
    Rapidito wrote:

    Depends on your DM problem... but no, accuracy is never enough. If you are trying to predict a 15%-85% distribution class, your 85% accuracy means absolutely nothing on its own.

    Try some roc or lift charts... but tell us more about your problem...
    the class distribution is about 10:8.5 , so I tried to find out are there other measure to determine performance of predictive model...
    I partitioned my dataset into 75 % for training, and 25% for testing using stratified split. My problem is clear, I want to find the best model among those three based model I have mentioned earlier.

    as for ROC and Lift chart, It's kind of a new stuff to me, unfortunately  :-[, but I will try to employ ROC and lift chart to see the difference, could you suggest further reading for ROC and Lift Chart?

    One More thing, is there implementation of CN2 in RM5? I got the implementation from Orange...

    Thank you very much.
  • What DM book have you got? It seems that especially the newer books consider them...

    Could you say what is the model about? It should help to define a good evaluation measure.
  • yogafire
    yogafire New Altair Community Member
    Rapidito wrote:

    What DM book have you got? It seems that especially the newer books consider them...

    Could you say what is the model about? It should help to define a good evaluation measure.
    I got a few DM books, and there are only a few explanation about lift chart and ROC in those.
    it's about prediction of benign and malignant tumor based on age and 4 lab test. 

    I have tried the ROC comparison and the result is just like this

    image

    could you help how to read this chart?

    Thank you very much
  • I can't see it. please upload it in higher resolution.

    Hmmm... tumor prediction. That is serious stuff, and cost-sensitive prediction is what you are after. You don't want to falsely predict benign if its malign, and is preferable to make many mistakes in saying malign when it's benign rather than saying beningn when it's malign...do I make myself clear?

    You should use the cost-sensitive options that come with RapidMiner, I've seen them but I haven't used them though... I can't tell you what parameters to use... maybe somebody else?
  • wessel
    wessel New Altair Community Member
    You should post the confusion matrix.

    http://en.wikipedia.org/wiki/Confusion_matrix


    Problem with an ROC curve, its not a number, so you can't compare 2 curves.

    Problem with area under curve, it often does not reflect anything meaningful.
    This is because you are often only interested in a small part of the curve.

    There exists corrected area under curve, but this is not in rapid miner.
  • Why can't you compare multiple ROC curves, considering they are run on a single test data base?

    I agree partly with what you say about area under the curve, because if you are interested in the best part of the curve (such as this case) then I guess you could just determine a segment and analize it...

    Do you have any link to somewhere explaining area under the curve correction? Haven't heard or read of it, thanks!

    Confusion matrix is great, get some confusion matrixes up yoga.
  • yogafire
    yogafire New Altair Community Member
    here's the confusion matrix of my backpropagation-based predictive model (which I consider the best among all)

    here's confusion matrix on train

    image

    and here's confusion matrix on test

    image

    after reading your comments, I have studied something from my DM book collection, and I would add precision and recall for measuring model performance. my DM books state that recall and precision can show us how our model works in identifying certain pattern in certain classes.
    what do you say?

    thanks
  • yogafire
    yogafire New Altair Community Member
    Rapidito wrote:

    I can't see it. please upload it in higher resolution.
    I am sorry... here's the ROC chart..

    image
  • I think ROC is not useful here, I think you must actually strive to get a 100% "malign" class recall model for starters and then try to get accuracy better without lowering the malign class recall at all.

    I have no experience, but I think cost sensitive should help you. Theoritically I think you can set costs for the model to prefer a Zero-Rule (all malign) model if necessary.

    I say this because I think that in your case the cost of a false 0 is terrible. Am I right?
  • yogafire
    yogafire New Altair Community Member
    Rapidito wrote:

    I say this because I think that in your case the cost of a false 0 is terrible. Am I right?
    the population distribution of each class is just about --> 0:1 == 10 : 8.5

    Yes, you're right... but I don't have any predefined cost matrix...

    my client only want the model to have minimum possible misclassification. but I'm just like you, I'm not really used to using cost-sensitive method. So, Do you think adding precision and recall in performance measurement will satisfy my client?

    but once again, any false prediction means terrible...

    thanks for your suggestion.
  • wessel
    wessel New Altair Community Member
    What is the task?

    What are the positive examples?
    What are the negative examples?

    What does it mean to falsely classify a positive example?
    What does it mean to falsely classify a negative example?

    A way to combine the precision and recall score into a single measure is:
    http://en.wikipedia.org/wiki/Cohen%27s_kappa
    The kappa score is in Rapid Miner.

    When you talk about recall, you mostly mean positive class recall, or sometimes also known as sensitivity.
    If you want to make your model more sensitive you need to increase the number of positive examples in your training set.

    Most applications need a minimum level of recall to be useful,
    They often also need a minimum level of precision to be useful (also known negative class recall).
    yogafire wrote:

    but once again, any false prediction means terrible...
    I think you mean specificity here, but maybe you have your 0 and 1 class the other way around, its common to have true positives in the top left corner, you have this in bottom right corner :-/
    If you use the area under ROC curve as a measure, you should only measure the area under the useful curve.

    Or you should calculate the sensitivity at a fixed level of specificity.
    This way you only have a single number and have something intuitive to talk about.
    Like model A: sensitivity 60% specificity 99%.
    Like model B: sensitivity 70% specificity 99%.

    Both systems have 1% false warnings, so when acting on a positive classification in 1% of the cases you waste your money.
    (This should be acceptable)
    Model B is clearly better then model A, it finds 10% more cases you are interested in.
  • yogafire
    yogafire New Altair Community Member
    wessel wrote:


    Or you should calculate the sensitivity at a fixed level of specificity.

    could you tell me how to apply that idea on RM5...? maybe you can upload a sample process to do such things?

    Thank you for your help?
  • wessel
    wessel New Altair Community Member
    Hmm this is tricky.

    I'm not sure this works equally well with every classifier.

    You either sample the number of negative examples to get a number somewhere around 99% specificity.

    Or you apply some threshold using the confidence of the prediction, I guess this is what you want but I'm not sure how to do this in RM5.

    If I find out how to do It I post it, but maybe someone else has already done this.

    edit: I think you may lose some information if you fix the specificity to 99% but it depends on the exact implementation.