🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Inconsistency of ROC curves

User: "bernardo_pagnon"
New Altair Community Member
Updated by Jocelyn
Hello,

I generated a ROC curve for a logistic regression with a data set by using the performance operator, then clicking on criterion and AUC. Fine. 

Then I used the same data set and use the Compare ROCs operator, picking logistic regression and decision tree as models. The ROC curves appear, and the ROC curve for the logistic regression is different from the one I obtained before! How can this be?

Best,
Bernardo

Find more posts tagged with

Sort by:
1 - 6 of 61
    User: "varunm1"
    New Altair Community Member
    Updated by varunm1
    Hello @bernardo_pagnon

    Can you share the process here? You can download it using FILE --> Export process and attach .rmp file here. Please also attach the data. I suspect change in some samples of test data. Are you using same type of validation for both compare ROC and regular model with performance metric and with a random seed? I will check and let you know if provided with details of the process and data. 

    If you cant share it here, you can send me a PM with requested files
    User: "bernardo_pagnon"
    New Altair Community Member
    OP
    User: "varunm1"
    New Altair Community Member
    Updated by varunm1
    Hello @bernardo_pagnon

    Thanks for sharing your process files and data.

    I used the complete datasheet in the excel file attached, I believe its the correct file. Now coming to the problem.

    Case 1 Process: In the case-1 process, I can see that you are training and testing on the same data. This is is not correct as you need to test on data that is independent of training data. If you are purposefully doing this for your requirement then it's fine.


    Case-2 Process: In case 2, you were using compare ROC operator. Based on the parameter settings as shown below, it uses 10 fold cross-validation that divided your dataset into 10 subsets and train on 9 subsets and test on 1 subset. This will happen until all subsets were tested and final performance is an aggregate of performance from all subsets.



    This is the reason you are getting different ROC curves. As your test data are different and processes are different in both cases the results (AUC and ROC) are different.

    I modified your case 1 to 10 fold cross-validation and now you can see in below image that the ROC curves of case 1 and case 2 are similar. The left side is for case 1 and the right side is for case 2. I attached the modifed process, you can open them in your rapidminer using FILE --> Import Process.



    Modified Case 1 process image: Added 10 fold cross-validation with Local random seed in parameters. I also added local random seed for compare ROC operator in Case 2 process with roc bias set to neutral


    Hope this helps. Please let us know if you need more information
    User: "bernardo_pagnon"
    New Altair Community Member
    OP
    Accepted Answer
    What can I say?
    1 - Big ,big  thanks
    2 - I was indeed training and testing on the same data to illustrate that one should not do that (it is for a class)
    3 - Great idea of putting 1-fold to be able to compare both cases.

    Best,
    Bernardo
    User: "varunm1"
    New Altair Community Member
    @bernardo_pagnon I got 100 points in this assignment in your class then :smiley::wink:
    User: "bernardo_pagnon"
    New Altair Community Member
    OP
    ahahahahahaha
    Can't argue with that!!!

    Best,
    Bernardo