How to interpret W-JRIP results
wj
New Altair Community Member
I already posted this question to problems and support, but since nobody has been able to answer I figured I posted it on wrong place.
I use the ripper (w-jrip) algorithm and RapidMiner 4.6 to find classification accuracy and rulesets for a dataset containing groups A and B, but I'm not quite sure how to interpret the output of classification accuracy / accuracy given by the j-rip ruleset.
If I look at the "Performance vector" tab which contains the confusion matrix and accuracy, I suppose the accuracy value is the mean accuracy obtained in the cross-validation process? And the sensitivity and specificity can be calculated from the confusion matrix, which shows mean values of true/false positives and negatives obtained by the validation process, is this correct? The thing that confuses me is the tab "W-JRip" which contains the ruleset that can be used to classify the subjects into groups A and B. Is this some kind of optimal ruleset that had the best classification accuracy in some iteration of the validation process? If I apply the ruleset to the dataset I always get better classification accuracy/sensitivity/specificity compared to the values in "Performance vector" -tab. The thing that worries me is that the accuracy given by the j-rip ruleset differs sometimes even 20 precentage points from the accuracy displayed in "performance" tab. Can someone explain how is this ruleset obtained by the software and which accuracy of the two (ruleset or the one in performance-tab) is more reliable / should be used? Thank you for help!
Just for information my dataset (about 100 subjects) has groups A and B and approx. 10 variables/features for each subject.
I use the ripper (w-jrip) algorithm and RapidMiner 4.6 to find classification accuracy and rulesets for a dataset containing groups A and B, but I'm not quite sure how to interpret the output of classification accuracy / accuracy given by the j-rip ruleset.
If I look at the "Performance vector" tab which contains the confusion matrix and accuracy, I suppose the accuracy value is the mean accuracy obtained in the cross-validation process? And the sensitivity and specificity can be calculated from the confusion matrix, which shows mean values of true/false positives and negatives obtained by the validation process, is this correct? The thing that confuses me is the tab "W-JRip" which contains the ruleset that can be used to classify the subjects into groups A and B. Is this some kind of optimal ruleset that had the best classification accuracy in some iteration of the validation process? If I apply the ruleset to the dataset I always get better classification accuracy/sensitivity/specificity compared to the values in "Performance vector" -tab. The thing that worries me is that the accuracy given by the j-rip ruleset differs sometimes even 20 precentage points from the accuracy displayed in "performance" tab. Can someone explain how is this ruleset obtained by the software and which accuracy of the two (ruleset or the one in performance-tab) is more reliable / should be used? Thank you for help!
Just for information my dataset (about 100 subjects) has groups A and B and approx. 10 variables/features for each subject.
Tagged:
0
Answers
-
Hi there,
All the operators prefixed with 'w-' are from Weka, not Rapidminer - hence the deafening silence! If you look through recent posts here you will see how RM calculates performance, the closest I can come to for JRip is from Frank and Whitten's book on Weka...
Accuracy for a rule = number of positives covered by the rule plus the the number of negatives not covered by this rule divided by the total number of examples of the class. ( p.207 ) .
Good luck with that!
0 -
Thank you for help, I'll look into the posts and book you mentioned!
-wj0