Home
Discussions
Community Q&A
Setting penalty or prior probabilities
Ras94
Hi,
I have a data set with prior probabilities of 75% and 25%. I would like to set a penalty or the probabilities, so that the models will account for the skewed distribution - right now my decision tree, for example, is just predicting 100% towards the larger class, resulting in a 75% accuracy. As my data set is not very large, I would prefer not to undersample.
Find more posts tagged with
AI Studio
Accepted answers
All comments
rfuentealba
If you don't want to downsample, you may take advantage of the SMOTE Upsampling operator, present in the Operator Toolbox.
However, I don't know what you are doing. If you may share a bit more information...
varunm1
Hello
@Ras94
Did you try any feature selection techniques? If not, I recommend you to try feature selection techniques and cross validate your model to check performance before sampling your dataset as 75 to 25 is not a highly imbalanced dataset and this sort of data need to be dealt in the real world.
Also, why are you trying only decision tree? you can go with other algorithms like logistic regression, SVM etc which could probably provide you better classification results. You can interpret results using explain predictions operator that helps you in factor analysis.
Thanks
Ras94
@varunm1
Thank you - I just went ahead with it and have been trying to evaluate on precision/recall/AUC. I have tried plenty of predictive models, but I was just wondering if there were a way to fix the decision tree since it is "broken" (e.g. see my issue).
IngoRM
Hi,
The tree is actually not "broken" but tries to generalize from the data without success. In those cases, it uses the majority class as prediction in all cases which is the only sensible thing to do. Sometimes a tree-based model is simply not a good fit for your data, sometimes the default parameters are not a good fit. You will probably get a different behavior if you change the pruning behavior, but that does not mean that this then is a good model in terms of predictive power (it can be better though).
Cheers,
Ingo
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)