"Performance estimation"

Legacy User
Legacy User New Altair Community Member
edited November 5 in Community Q&A
Hi,

I'm using supervised machine learning to classify my data. The
approach I use as classifier is a decision tree (but could by any
other)- After constructing an appropriate decision tree, I would like
to measure the model's performance. What are standard measures in the
domain of statistics and artificial intelligence domain to estimate
performance of a classification algorithm?

So far, I've used a leave-one-out cross validation (due to the small
number of examples in the learning set which is about 400) to evaluate
the accuracy (classification error), i.e. how many examples in the test set
were incorrectly predicted. However, I don't think that this is sufficient
for a reliable performance evaluation. What else should I measure?

I'm not sure if a significance test would provide helpful information.
In my text book, they use the significance test to compare two
different classification algorithm w.r.t. to their absolute error
(they determine by a cross validation). Also in the one RapidMiner sample where
the T-Test operator is used, two models are compared. Can a significance test be
also exploited to make performance assumption about a single classifier?
If so, what hypothesis should be tested? And how can this be achieved
in RapidMiner which for T-Test expects two PerformanceVectors?

Thank you.

Regards,
tim

Answers

  • land
    land New Altair Community Member
    Hi Tim,
    a leave one out crossvalidation is already a very good estimation of the resulting performance and the best you could do. Statistics provide some different methods for performance estimation, but they are very heuristic and are avoided in the field of data mining since they don't make use of the data we have. The quality of Cross-Validation is determined by the quality of your training sample, the more representative your training data for your problem is, the better the quality and hence the less the performance will be overestimated.
    A sigificance test tests, if one model is significantly better than another. You might compare one model with itself but it will never be siginificantly better than itself :) No way doing that.

    Greetings,
      Sebastian
  • steffen
    steffen New Altair Community Member
    Hello Tim

    Just a few remarks:
    @Crossvalidation: addtional remarks you can find here http://rapid-i.com/rapidforum/index.php/topic,62.0.html

    So far, I've used a leave-one-out cross validation (due to the small
    number of examples in the learning set which is about 400) to evaluate
    the accuracy (classification error), i.e. how many examples in the test set
    were incorrectly predicted. However, I don't think that this is sufficient
    for a reliable performance evaluation. What else should I measure?
    Note that if your data suffers from heavy class imbalance, the accuracy could be maximized by simply predicting the bigger class. Hence Precision and Recall should be measured,too.

    Can a significance test be
    also exploited to make performance assumption about a single classifier?
    If so, what hypothesis should be tested? And how can this be achieved
    in RapidMiner which for T-Test expects two PerformanceVectors?
    One idea is to calculate the expected value of your measure when using a random classifier i.e. a classifier assigning random classes to all instances. Then you can perform a simple one-sided test given an appropriate distribution assumption. Thus you will see whether your classifier is significantly better than random.
    This can not be performed in RapidMiner, but the required formulas are in every book about statistics.
    Please not that all significance testing is worthless if the distribution assumptions for the tests are not met. Paired t-test for instance assumes that the difference of the performance measurements is approximately normally distributed.

    regards,

    Steffen
  • Legacy User
    Legacy User New Altair Community Member
    Hi Steffen,
    One idea is to calculate the expected value of your measure when using a random classifier i.e. a classifier assigning random classes to all instances. Then you can perform a simple one-sided test given an appropriate distribution assumption. Thus you will see whether your classifier is significantly better than random.
    Does it make sense to compare a "real" model against a random classifier? Is this a common approach
    used in practice? Are the accuracy, precision and recall measurements not sufficient?

    Any why is it not possible to model your suggested approach in RapidMiner?

    Regards,
    Tim
  • steffen
    steffen New Altair Community Member
    Hello Tim

    Does it make sense to compare a "real" model against a random classifier? Is this a common approach
    used in practice? Are the accuracy, precision and recall measurements not sufficient?
    If you have enough data and the mentioned measurements are far away from the values of a random classifier, such a test is not necessary. One the other hand, assume you have a small amount of data. In this case it is often nearly impossible to favour one model against another one as significant better (since the difference in the measure between both models is small), so it is interesting to know if your model is at least better than the baseline (=random classifier).  A test like this can show if it is sensible to optimize models,  given this amount of data...

    Any why is it not possible to model your suggested approach in RapidMiner?
    As Ingo used to say: Noone asked for it yet ;). But it isnt to hard to do it on your own. Here are some hints:
    -> Precision for a given class be modeled as binomialdistribution, which can be approximated by a normaldistribution
    -> now a simple one-sided test about the parameter p of the binomialdistribution can be performed
    -> To estimate precision or p you can simply combine all validationsets of a XValidation by using XVPrediction
    As mentioned before, any book about statistics should contain the required information.

    btw: I do not know in what kind of position you are, but if there is some kind of statistician around, go and grab him. I made the experience, that the opinions about statistical tests are rather.... different.

    regards,

    Steffen