Low Recall High Accuracy

ozcan · February 2020

Below example results for same dataset. And dataset has not missing value;

For Naive Bayes:
Rapidminer Recall: 26.35% +/- 5.17% (micro average: 26.37%)

Weka Recall: 0.768
Rapidminer Precision: 43.41%

Weka Precision: 0.735
Rapidminer Accuracy:77.14
Weka Accuracy:76.7639 %

For Random Forrest:
Rapidminer Recall: 16.60% +/- 6.01% (micro average: 16.59%)

Weka Recall: 0.843
Rapidminer Accuracy:81.75%
Weka Accuracy:84.2897 %

For KNN:
Rapidminer Recall: 12.89% +/- 3.82% (micro average: 12.89%)

Weka Recall: 0.824
Rapidminer Precision: 55.82% +/- 12.05% (micro average: 55.77%)

Weka Precision: 0.810
Rapidminer Accuracy:79.40%
Weka Accuracy:82.4396 %

For Decision Tree
Weka Accuracy; 81.4989 %
RapidMiner Accuracy: 83.07%
Weka Recall; 0.815
RapidMiner Recall: 30.67%

Why rapidminer recall and and precision value is very low despite accuracy is high. Especially recall value. ?
My process is in attach. I use same process for other algorithms

**Also I try other settings in related Algorithms for improve recall in Rapidminer.
I mean ,
For Example KNN;
Changing K values, measure types, mixes measure, weighted vote.
Decision Tress;
Changing criterion,maximal dept, prunning,confidence,preprunning,minimal gain, leaf size,minimal size for split,number of preprunning alternatives
Random Forrest;
Changing number of trees, criterion,prunning,confidence,preprunning, random splits,guess subset ratio, voting strategy ets
But still recall value is low

varunm1 · February 2020

Hello @ozcan

I checked your decision tree process and data. So, recall and precision are calculated based on positive and negative classes. In your case, rapidminer is taking label "1" as a positive class by default. In that case, the recall is low as mentioned in your post. If you set the positive class manually by using "Performance (Binominal Classification)" to "0" then your recall is 90.25%.

I think in weka the positive class might be 0, you need to check that and confirm. Try checking recall for both classes in rapidminer and weka. There might be other issues as well. I also added a better way to build your process.

Image: https://us.v-cdn.net/6030995/uploads/editor/r9/8r73i6t6ychz.png

lionelderkrikor · February 2020

Hi @ozcan,

This results can be explained by a highly imbalanced dataset.
In this case, the algorithm has difficulties to "capture" the relationships between your regular attribute(s) and the minority class of your label and thus to correctly predict the minority class, that's why the recall is low although the accuracy is relatively good.
However I don't know why there is significant difference between Weka and RapidMiner.
Could your share your dataset ?

Regards,

Lionel

varunm1 · February 2020

Hello @ozcan

This is a tricky question. How are you gettings these results? Are you cross validating or split validating your data? If so are the test data sets same in both rapidminer and weka.

How about the hyperparameters of these algorithms? Are they exactly same?

[Deleted User] · February 2020

@ozcan

Hello

It depends on your data and depends on algorithm. According to the classification and clustering when any software wants to do classification or clustering on your data may be you see some differences and this is not a problem. Base of data science is with Statistics and Probabilities. So different Accuracy is normal.

I hope this helps
mbs

ozcan · February 2020

Hi my dataset in attachment.
In rapidminer; I change to bug label as nominal other attributes are real.
In weka ; I change bug label numeric to nominal , other attributes are numeric
For all algorithms; I user 10 cross validation for Weka and Rapidminer
I set role bug label. I select all attributes.for all algoritgms.
Cross validations is folds:10, other options are default.
I didnt any changes of algorithm options, All of them are default settings.
But minor differences between Weka and Rapidminer; can be confidence interval. But this should not be affect recall like this.
This is not a tricky question. These results and comprassion are need to my thesis. @lionelderkrikor @varunm1 @mbs

varunm1 · February 2020

Hello @ozcan

We understand that, but what we are trying to say is that the performance varies based on the way 10 folds of cross validations are divided and also the settings of each algorithm. The default in raoidminer and weka might not be same, the base algorithm might not be working the same way of default parameters are not similar

I am not sure if its a good idea to compare two softwares based on performance. I guess @IngoRM might help you with the pitfalls of doing this.

[Deleted User] · February 2020

@ozcan

I had an experience about different Accuracy but this is not a problem you can accept both answers for both software because they are not the same in clustering and classification and according to the Statistics and Probabilities both of them are correct. May be others can help you more.

One more thing:
Take a look on your data please, you have a lot of different numbers in your data which is very important and can affect on your process.

All the best
mbs

sgenzer · February 2020

hi @ozcan yes can we please see your actual processes and data to replicate your results?

ozcan · February 2020

Hi @sgenzer , My dataset and process are in attachment. Thanks.

Moreover; For decision tree;
Weka Accuracy; 81.4989 %
RapidMiner Accuracy: 83.07% %
Weka Recall; 0.815
RapidMiner Recall: 30.67%

varunm1 · February 2020

Hello @ozcan

I checked your decision tree process and data. So, recall and precision are calculated based on positive and negative classes. In your case, rapidminer is taking label "1" as a positive class by default. In that case, the recall is low as mentioned in your post. If you set the positive class manually by using "Performance (Binominal Classification)" to "0" then your recall is 90.25%.

I think in weka the positive class might be 0, you need to check that and confirm. Try checking recall for both classes in rapidminer and weka. There might be other issues as well. I also added a better way to build your process.

ozcan · February 2020

Hi @varunm1
Yes this setting solved my problem. Thank you very much . One more thing, my bug label is nominal. For this, ı get potential problem . Is it effect my results. I have to change bug label to binominal.? ı add screenshot to attachment

varunm1 · February 2020

Its a warning and no need to worry. You can also change using numerical to the binominal operator. This will change your 0 and 1 as False and True (binominal categories). You should always be careful while analyzing your results and understand how they can change based classes, data and models.

Low Recall High Accuracy

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories