"Interpretation/Meaning of Performance Measure"

choose_username · May 2010

Hello,

i have build a workflow, which shall classify examples with a decision tree. I used a Performance-measure-Operator too. In the result are accuracy , precision and recall listed.

What does this measures mean? Is there a difference between accuracy and precision and what is the meaning of recall?

greetings

user

IngoRM · May 2010

Hi,

those measures are all taken from a confusion matrix (http://en.wikipedia.org/wiki/Confusion_matrix):

	Actually Positive	Actually Negative
Predicted Positive	a	b
Predicted Negative	c	d

After the prediction, RapidMiner checks for each example the true label and the prediction and sorts the example into the correct place in the confusion matrix, i.e. increases the count by 1. The sum of a,b,c, and d hence is the total number of examples. The value "a" is called "true positives", "b" is called "false positives", "c" is called "false negatives", and "d" is called "true negatives".

The ratio of correctly classified examples compared to the number of all examples is called accuracy and is calculated as (a+d)/(a+b+c+d).

The ratio of true positives to all as positive predicted examples is called precision and is calculated as a / (a+b).

The ratio of true positives to all actually positive examples is called recall and is calculated as a / (a+c).

Accuracy is also available in the case of more than two classes, precision and recall are only available for two-class-problems (you can, however, always calculate a per class precision and recall).

Cheers,
Ingo

choose_username · May 2010

thx for the answer. i conclude from this, that the most interessting measure (overall) is the accuracy and the other ones are more class specific.

Greetings

Nutzer

IngoRM · June 2010

Hello,

i conclude from this, that the most interessting measure (overall) is the accuracy and the other ones are more class specific.

well, it depends. If you know nothing about your classes and the costs for different types of errors, accuracy might indeed a good value to optimize. In other cases, where the application requires a high precision or recall for a specific class, you should definitely go for them. Or you introduce costs for different errors. Or...

If you have more than two classes and search for a single-number-evaluation to optimize there is not much left beside accuracy and kappa (and some others).

Cheers,
Ingo

choose_username · June 2010

Hi,

thanks for the information. I will keep that in mind, when working on it.

greetings

User

wessel · June 2010

Ingo Mierswa wrote:

If you have more than two classes and search for a single-number-evaluation to optimize there is not much left beside accuracy and kappa (and some others).

How is the kappa score calculated in Rapid Miner?

I always think of it as the normalized accuracy score.
Kappa = 0, all predictions are the majority class.
Kappa = 1, all predictions are correct.
Kappa = -1, all predictions are wrong.

If you know only know accuracy is 99%, you don't really know much.
Because you might have a dataset with 9900 negative and only 100 positive example.
And then you are only interested in systems with an accuracy greater then 99%.

IngoRM · June 2010

Hi,

here it is (fresh from the source code

):


			double pa = accuracy;
			double pe = 0.0d;
			for (int i = 0; i < counter.length; i++) {
				double row = 0.0d;
				double column = 0.0d;
				for (int j = 0; j < counter.length; j++) {
					row += counter;
					column += counter;
				}
				//pe += ((row * column) / Math.pow(total, counter.length));
				pe += ((row * column) / (total * total));
			}
			return (pa - pe) / (1.0d - pe);

Cheers,
Ingo

"Interpretation/Meaning of Performance Measure"

Welcome!

Answers

Welcome!

Welcome!

Quick Links

Categories