Low Recall High Accuracy

ozcan
ozcan New Altair Community Member
edited November 5 in Community Q&A
Below example results for same dataset. And dataset has not missing value;

For Naive Bayes:
Rapidminer Recall: 26.35% +/- 5.17% (micro average: 26.37%) :/
Weka Recall: 0.768
Rapidminer Precision: 43.41% :/ 
Weka Precision: 0.735
Rapidminer Accuracy:77.14
Weka Accuracy:76.7639 %

For Random Forrest: 
Rapidminer Recall: 16.60% +/- 6.01% (micro average: 16.59%) :/
Weka Recall: 0.843
Rapidminer Accuracy:81.75%
Weka Accuracy:84.2897 %

For KNN: 
Rapidminer Recall:  12.89% +/- 3.82% (micro average: 12.89%) :/ 
Weka Recall: 0.824
Rapidminer Precision: 55.82% +/- 12.05% (micro average: 55.77%) :/ 
Weka Precision: 0.810
Rapidminer Accuracy:79.40%
Weka Accuracy:82.4396 %

For Decision Tree
Weka Accuracy; 81.4989 % 
RapidMiner Accuracy: 83.07% 
Weka Recall; 0.815 
RapidMiner Recall: 30.67%

Why rapidminer recall and and precision value is very low despite accuracy is high. Especially recall value. ? 
My process is in attach. I use same process for other algorithms

**Also I try other settings in related Algorithms for improve recall in Rapidminer. 
I mean ,
For Example KNN;
Changing K values, measure types, mixes measure, weighted vote.
Decision Tress;
Changing criterion,maximal dept, prunning,confidence,preprunning,minimal gain, leaf size,minimal size for split,number of preprunning alternatives
Random Forrest;
Changing number of trees, criterion,prunning,confidence,preprunning, random splits,guess subset ratio, voting strategy ets
But still recall value is low
01.JPG 33.5K
02.JPG 40.3K

Best Answer

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @ozcan,

    This results can be explained by a highly imbalanced dataset.
    In this case, the algorithm has difficulties to "capture" the relationships between your regular attribute(s) and the minority class of your label and thus to correctly predict the minority class, that's why the recall is low although the accuracy is relatively good.
    However I don't know why there is significant difference between Weka and RapidMiner.
    Could your share your dataset ?

    Regards,

    Lionel

  • varunm1
    varunm1 New Altair Community Member
    edited February 2020
    Hello @ozcan

    This is a tricky question. How are you gettings these results? Are you cross validating or split validating your data? If so are the test data sets same in both rapidminer and weka. 

    How about the hyperparameters of these algorithms? Are they exactly same?


  • [Deleted User]
    [Deleted User] New Altair Community Member
    @ozcan

    Hello

    It depends on your data and depends on algorithm. According to the classification and clustering when any software wants to do classification or clustering on your data may be you see some differences and this is not a problem. Base of data science is with Statistics and Probabilities. So different Accuracy is normal.

    I hope this helps
    mbs
  • ozcan
    ozcan New Altair Community Member
    edited February 2020
    Hi my dataset in attachment.
    In rapidminer; I change to bug label as nominal other attributes are real.
    In weka ; I change bug label numeric to nominal , other attributes are numeric
    For all algorithms; I user 10 cross validation for Weka and Rapidminer
    I set role bug label. I select all attributes.for all algoritgms.
    Cross validations is folds:10, other options are default.
    I didnt any changes of algorithm options, All of them are default settings.
    But minor differences between Weka and Rapidminer; can be confidence interval. But this should not be affect recall like this.
    This is not a tricky question. These results and comprassion are need to my thesis. @lionelderkrikor @varunm1 @mbs


  • varunm1
    varunm1 New Altair Community Member
    edited February 2020
    Hello @ozcan

    We understand that, but what we are trying to say is that the performance varies based on the way 10 folds of cross validations are divided and also the settings of each algorithm. The default in raoidminer and weka might not be same, the base algorithm might not be working the same way of default parameters are not similar

    I am not sure if its a good idea to compare two softwares based on performance. I guess @IngoRM might help you with the pitfalls of doing this.
  • [Deleted User]
    [Deleted User] New Altair Community Member
    edited February 2020
    @ozcan

    I had an experience about different Accuracy but this is not a problem you can accept both answers for both software because they are not the same in clustering and classification and according to the Statistics and Probabilities both of them are correct. May be others can help you more. :)
    One more thing:
    Take a look on your data please, you have a lot of different numbers in your data which is very important and can affect on your process.

    All the best
    mbs
  • sgenzer
    sgenzer
    Altair Employee
    hi @ozcan yes can we please see your actual processes and data to replicate your results?
  • ozcan
    ozcan New Altair Community Member
    edited February 2020
    Hi @sgenzer , My dataset and process are in attachment. Thanks.
    Moreover; For decision tree;
    Weka Accuracy; 81.4989 %
    RapidMiner Accuracy: 83.07% %
    Weka Recall; 0.815 
    RapidMiner Recall: 30.67%
  • ozcan
    ozcan New Altair Community Member
    Hi @varunm1
    Yes this setting solved my problem. Thank you very much . One more thing, my bug label is nominal. For this, ı get potential problem . Is it effect my  results. I have to change bug label to binominal.? ı add screenshot to attachment
  • varunm1
    varunm1 New Altair Community Member
    Its a warning and no need to worry. You can also change using numerical to the binominal operator. This will change your 0 and 1 as False and True (binominal categories). You should always be careful while analyzing your results and understand how they can change based classes, data and models.