Lift Charts - Improvements in lower deciles
btibert
New Altair Community Member
I know that there have been a number of discussions here on lift charts, including this one, https://community.rapidminer.com/discussion/55257/about-lift-chart, but I have to admit I am wondering why in so many of my examples (different datasets, different techniques), the Lift chart (or simple lift chart) output is showing situations where the hit rate/conversion actually goes up in deciles farther to the right. By definition, the data are sorted on confidence of the target class, descending, and you would normally see the hit rates drop with each decile, as I did with the same dataset/technique in a different tool.
Even in the example I linked to above, the hit rate actually goes up in decile 6. Admittedly I very rarely see this, so I am wondering if there is an explanation or an intuition you can share why this appears so often here in RM.
Above, the results are from a logistic regression.
Last but not least, is there a way to set a reference line on these charts to show the baseline % of the target? I think that would really simplify the visualization for people to understand the concept of lift.
Even in the example I linked to above, the hit rate actually goes up in decile 6. Admittedly I very rarely see this, so I am wondering if there is an explanation or an intuition you can share why this appears so often here in RM.
Above, the results are from a logistic regression.
Last but not least, is there a way to set a reference line on these charts to show the baseline % of the target? I think that would really simplify the visualization for people to understand the concept of lift.
Tagged:
0
Best Answer
-
Hey there,1) Simple LC vs. LC: I really do recommend to use the newer Lift Chart (Simple) version, the other one is kind of unstable when the thresholds value are very close together. This often leads - like in your example - to cases where you will get not the desired number of buckets. This most often happens for smaller data sets (like in your case with the 283 examples) and / or with models which produce only a limited set of discrete confidence values (like for example decision trees).2) Reference line: this is currently not possible but may be a good idea indeed. There is some risk that the charts gets even busier but definitely worth a try.
3) Change of slope: this can indeed happen, especially (like above) for smaller data sets and / or for models with a limited set of confidence values, e.g. decision trees. I know that some tools sometimes "cheat" to avoid this in their visualization but I personally rather prefer to see this TBH. And as above, it is much less likely to happen for larger data sets and for models like Naive Bayes and others which produce more fine-grained confidence values.Hope this helps,
Ingo5
Answers
-
And when using Lift Chart, I am not sure why when i specify 10 bins, there are more than 10.
1 -
Hey there,1) Simple LC vs. LC: I really do recommend to use the newer Lift Chart (Simple) version, the other one is kind of unstable when the thresholds value are very close together. This often leads - like in your example - to cases where you will get not the desired number of buckets. This most often happens for smaller data sets (like in your case with the 283 examples) and / or with models which produce only a limited set of discrete confidence values (like for example decision trees).2) Reference line: this is currently not possible but may be a good idea indeed. There is some risk that the charts gets even busier but definitely worth a try.
3) Change of slope: this can indeed happen, especially (like above) for smaller data sets and / or for models with a limited set of confidence values, e.g. decision trees. I know that some tools sometimes "cheat" to avoid this in their visualization but I personally rather prefer to see this TBH. And as above, it is much less likely to happen for larger data sets and for models like Naive Bayes and others which produce more fine-grained confidence values.Hope this helps,
Ingo5 -
Thank you for the note, I appreciate it.0
-
I've also created lift charts manually (not using the lift chart operator) when facing this same issue (.e.g, the number of auto-generated bins not coming out properly due to the presence of ties). You can use the Discretize operator instead and then use Aggregate to generate the underlying tables, and then create the charts you want (either inside RapidMiner or outside).1
-
Thanks @Telcontar120, I started going down the path that you outlined above (manual) but I was wrestling with the binning so I punted on that portion of my class, but will revisit it next week.
1