Inconsistency of ROC curves

New Altair Community Member

Mar 18, 2020

Updated Nov 5, 2024 by Jocelyn

Hello,

I generated a ROC curve for a logistic regression with a data set by using the performance operator, then clicking on criterion and AUC. Fine.

Then I used the same data set and use the Compare ROCs operator, picking logistic regression and decision tree as models. The ROC curves appear, and the ROC curve for the logistic regression is different from the one I obtained before! How can this be?

Best,
Bernardo

Find more posts tagged with

AI Studio

AUC ROC

Sort by:

1 - 6 of 61

varunm1

New Altair Community Member

Mar 18, 2020

Updated Mar 18, 2020 by varunm1

Hello @bernardo_pagnon

Can you share the process here? You can download it using FILE --> Export process and attach .rmp file here. Please also attach the data. I suspect change in some samples of test data. Are you using same type of validation for both compare ROC and regular model with performance metric and with a random seed? I will check and let you know if provided with details of the process and data.

If you cant share it here, you can send me a PM with requested files

bernardo_pagnon

New Altair Community Member

Mar 18, 2020

There it goes!

case1.rmp

case2.rmp

IMB579-XLS-ENG.xlsx

varunm1

New Altair Community Member

Mar 18, 2020

Updated Mar 18, 2020 by varunm1

Hello @bernardo_pagnon

Thanks for sharing your process files and data.

I used the complete datasheet in the excel file attached, I believe its the correct file. Now coming to the problem.

Case 1 Process: In the case-1 process, I can see that you are training and testing on the same data. This is is not correct as you need to test on data that is independent of training data. If you are purposefully doing this for your requirement then it's fine.

Case-2 Process: In case 2, you were using compare ROC operator. Based on the parameter settings as shown below, it uses 10 fold cross-validation that divided your dataset into 10 subsets and train on 9 subsets and test on 1 subset. This will happen until all subsets were tested and final performance is an aggregate of performance from all subsets.

Image: https://us.v-cdn.net/6030995/uploads/editor/2q/undarl87o8uf.png

This is the reason you are getting different ROC curves. As your test data are different and processes are different in both cases the results (AUC and ROC) are different.

I modified your case 1 to 10 fold cross-validation and now you can see in below image that the ROC curves of case 1 and case 2 are similar. The left side is for case 1 and the right side is for case 2. I attached the modifed process, you can open them in your rapidminer using FILE --> Import Process.

Image: https://us.v-cdn.net/6030995/uploads/editor/9z/dslzszq9btg6.jpg

Modified Case 1 process image: Added 10 fold cross-validation with Local random seed in parameters. I also added local random seed for compare ROC operator in Case 2 process with roc bias set to neutral