A program to recognize and reward our most engaged community members
I understand that the "Find threshold" operator uses ROC to determine the best threshold. But, what kind of algorithm it uses to select the threshold? For example, (1) optimizes the precision and recall, or (2) something like this: http://stats.stackexchange.com/questions/29719/how-to-determine-best-cutoff-point-and-its-confidence-interval-using-roc-curve-i, or (3) other
Thanks!
Hi Johnny,
You should be able to track it down on the github. RapidMiner Github
Try here: Find threshold & ROC helper class
Great, thanks. Let me take a look!
I tried to understand the code in the method "public ROCData createROCData", but I am not quite understanding what method it is using to determining the best threshold. Is there any paper that it is based on?
The code is in:
"https://github.com/rapidminer/rapidminer-studio/blob/85d3bee36c026a70580075092ed85ac517369e8e/src/main/java/com/rapidminer/tools/math/ROCDataGenerator.java"
This process doesn't use cross-validation, but the cross-validated result is the same (in this case the unexpected behaviour could be caused by applying a model on unseen data, therefore I am testing on the training set to catch the bug).
The problem is simple, I have missclassifications cost of 25 (no fraud) and 10 (fraud). It is actually more expensive to missclassify a loyal customer than a fraud customer. I define these costs in the operator Find Threshold and then evaluate the results with Performance (Costs).
The problem is that I get better results when I use cost1 = 250 in Find Threshold instead of cost1 = 25. If you can explain me why is it so, I would really appreciate it!
Kind regards,
Sebastian