Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Setting penalty or prior probabilities

Hi,

I have a data set with prior probabilities of 75% and 25%. I would like to set a penalty or the probabilities, so that the models will account for the skewed distribution - right now my decision tree, for example, is just predicting 100% towards the larger class, resulting in a 75% accuracy. As my data set is not very large, I would prefer not to undersample.

Find more posts tagged with

AI Studio

Accepted answers

All comments

rfuentealba

If you don't want to downsample, you may take advantage of the SMOTE Upsampling operator, present in the Operator Toolbox.

However, I don't know what you are doing. If you may share a bit more information...

varunm1

Hello @Ras94

Did you try any feature selection techniques? If not, I recommend you to try feature selection techniques and cross validate your model to check performance before sampling your dataset as 75 to 25 is not a highly imbalanced dataset and this sort of data need to be dealt in the real world.

Also, why are you trying only decision tree? you can go with other algorithms like logistic regression, SVM etc which could probably provide you better classification results. You can interpret results using explain predictions operator that helps you in factor analysis.

Thanks

Ras94

@varunm1 Thank you - I just went ahead with it and have been trying to evaluate on precision/recall/AUC. I have tried plenty of predictive models, but I was just wondering if there were a way to fix the decision tree since it is "broken" (e.g. see my issue).

IngoRM

Hi,

The tree is actually not "broken" but tries to generalize from the data without success. In those cases, it uses the majority class as prediction in all cases which is the only sensible thing to do. Sometimes a tree-based model is simply not a good fit for your data, sometimes the default parameters are not a good fit. You will probably get a different behavior if you change the pruning behavior, but that does not mean that this then is a good model in terms of predictive power (it can be better though).

Cheers,
Ingo