"Interpretation/Meaning of Performance Measure"
choose_username
New Altair Community Member
Hello,
i have build a workflow, which shall classify examples with a decision tree. I used a Performance-measure-Operator too. In the result are accuracy , precision and recall listed.
What does this measures mean? Is there a difference between accuracy and precision and what is the meaning of recall?
greetings
user
i have build a workflow, which shall classify examples with a decision tree. I used a Performance-measure-Operator too. In the result are accuracy , precision and recall listed.
What does this measures mean? Is there a difference between accuracy and precision and what is the meaning of recall?
greetings
user
Tagged:
0
Answers
-
Hi,
those measures are all taken from a confusion matrix (http://en.wikipedia.org/wiki/Confusion_matrix):
After the prediction, RapidMiner checks for each example the true label and the prediction and sorts the example into the correct place in the confusion matrix, i.e. increases the count by 1. The sum of a,b,c, and d hence is the total number of examples. The value "a" is called "true positives", "b" is called "false positives", "c" is called "false negatives", and "d" is called "true negatives".Actually Positive Actually Negative Predicted Positive a b Predicted Negative c d
The ratio of correctly classified examples compared to the number of all examples is called accuracy and is calculated as (a+d)/(a+b+c+d).
The ratio of true positives to all as positive predicted examples is called precision and is calculated as a / (a+b).
The ratio of true positives to all actually positive examples is called recall and is calculated as a / (a+c).
Accuracy is also available in the case of more than two classes, precision and recall are only available for two-class-problems (you can, however, always calculate a per class precision and recall).
Cheers,
Ingo
0 -
thx for the answer. i conclude from this, that the most interessting measure (overall) is the accuracy and the other ones are more class specific.
Greetings
Nutzer0 -
Hello,
well, it depends. If you know nothing about your classes and the costs for different types of errors, accuracy might indeed a good value to optimize. In other cases, where the application requires a high precision or recall for a specific class, you should definitely go for them. Or you introduce costs for different errors. Or...
i conclude from this, that the most interessting measure (overall) is the accuracy and the other ones are more class specific.
If you have more than two classes and search for a single-number-evaluation to optimize there is not much left beside accuracy and kappa (and some others).
Cheers,
Ingo0 -
Hi,
thanks for the information. I will keep that in mind, when working on it.
greetings
User0 -
How is the kappa score calculated in Rapid Miner?Ingo Mierswa wrote:
If you have more than two classes and search for a single-number-evaluation to optimize there is not much left beside accuracy and kappa (and some others).
I always think of it as the normalized accuracy score.
Kappa = 0, all predictions are the majority class.
Kappa = 1, all predictions are correct.
Kappa = -1, all predictions are wrong.
If you know only know accuracy is 99%, you don't really know much.
Because you might have a dataset with 9900 negative and only 100 positive example.
And then you are only interested in systems with an accuracy greater then 99%.
0 -
Hi,
here it is (fresh from the source code ):
Cheers,
double pa = accuracy;
double pe = 0.0d;
for (int i = 0; i < counter.length; i++) {
double row = 0.0d;
double column = 0.0d;
for (int j = 0; j < counter.length; j++) {
row += counter;
column += counter;
}
//pe += ((row * column) / Math.pow(total, counter.length));
pe += ((row * column) / (total * total));
}
return (pa - pe) / (1.0d - pe);
Ingo0