Setting penalty or prior probabilities

Ras94
New Altair Community Member
Hi,
I have a data set with prior probabilities of 75% and 25%. I would like to set a penalty or the probabilities, so that the models will account for the skewed distribution - right now my decision tree, for example, is just predicting 100% towards the larger class, resulting in a 75% accuracy. As my data set is not very large, I would prefer not to undersample.
I have a data set with prior probabilities of 75% and 25%. I would like to set a penalty or the probabilities, so that the models will account for the skewed distribution - right now my decision tree, for example, is just predicting 100% towards the larger class, resulting in a 75% accuracy. As my data set is not very large, I would prefer not to undersample.
Tagged:
0
Answers
-
If you don't want to downsample, you may take advantage of the SMOTE Upsampling operator, present in the Operator Toolbox.
However, I don't know what you are doing. If you may share a bit more information...1 -
Hello @Ras94
Did you try any feature selection techniques? If not, I recommend you to try feature selection techniques and cross validate your model to check performance before sampling your dataset as 75 to 25 is not a highly imbalanced dataset and this sort of data need to be dealt in the real world.
Also, why are you trying only decision tree? you can go with other algorithms like logistic regression, SVM etc which could probably provide you better classification results. You can interpret results using explain predictions operator that helps you in factor analysis.
Thanks2 -
Hi,The tree is actually not "broken" but tries to generalize from the data without success. In those cases, it uses the majority class as prediction in all cases which is the only sensible thing to do. Sometimes a tree-based model is simply not a good fit for your data, sometimes the default parameters are not a good fit. You will probably get a different behavior if you change the pruning behavior, but that does not mean that this then is a good model in terms of predictive power (it can be better though).Cheers,
Ingo2