"Problems with Lift Chart"
Jane
New Altair Community Member
Hello,
I'm trying to display some results with a lift chart, but the output of the chart is not what I was expecting. The x-axis of the lift chart, which is labeled as "fraction" should be the fraction of the dataset (ordered by confidence values) used to calculate the lift. The y-axis should be the (cumulative) lift, calculated as (% of positive labels in this fraction of the dataset) / (% of positive labels in the entire dataset).
I've looked at the code in the file LiftDataGenerator, and it looks like the example set is being ordered by confidence values, but then the x-value is simply the rank order of each example, and the lift is calculated from the cumulative values in the confidence matrix as: (TP*(FP+TN)) / ((TP+FP)*(TP+FN)), which isn't far from what I'm looking for, but it isn't quite right. Is this a bug, or is this just an incarnation of the lift chart that is different from the one that I'm used to?
On a related note, I tried to modify the code in LiftDataGenerator directly, to get it to display the results as I wanted them, but RapidMiner never seemed to compile my changes. I'm on Windows, and I always run RapidMiner through the GUI. If I want to create my own operators, do I need to have a development package like Eclipse, or is there another way to get RapidMiner to re-compile itself?
Thanks in advance for any help!
Jane
I'm trying to display some results with a lift chart, but the output of the chart is not what I was expecting. The x-axis of the lift chart, which is labeled as "fraction" should be the fraction of the dataset (ordered by confidence values) used to calculate the lift. The y-axis should be the (cumulative) lift, calculated as (% of positive labels in this fraction of the dataset) / (% of positive labels in the entire dataset).
I've looked at the code in the file LiftDataGenerator, and it looks like the example set is being ordered by confidence values, but then the x-value is simply the rank order of each example, and the lift is calculated from the cumulative values in the confidence matrix as: (TP*(FP+TN)) / ((TP+FP)*(TP+FN)), which isn't far from what I'm looking for, but it isn't quite right. Is this a bug, or is this just an incarnation of the lift chart that is different from the one that I'm used to?
On a related note, I tried to modify the code in LiftDataGenerator directly, to get it to display the results as I wanted them, but RapidMiner never seemed to compile my changes. I'm on Windows, and I always run RapidMiner through the GUI. If I want to create my own operators, do I need to have a development package like Eclipse, or is there another way to get RapidMiner to re-compile itself?
Thanks in advance for any help!
Jane
Tagged:
0
Answers
-
Hi,
I am actually not too familiar with Lift Charts myself but as far as I can see the way RapidMiner is calculating the lift chart data points seems to be the same but not with fixed fractions (for example decentiles) but with all possible fractions (ranks). But I can also be wrong and there is actually a bug...
Is this a bug, or is this just an incarnation of the lift chart that is different from the one that I'm used to?
About the compilation: in theory a text editor and the small tool "Ant" should be enough but it is actually much easier to use a program like Eclipse for recompiling RapidMiner. We have a description on our web site how to compile RapidMiner yourself here: http://rapid-i.com/content/view/25/48/
Hope that helps.
Anyone any opinion about the lift calculation?
Cheers,
Ingo0