Newbie - expected performance output -after using the sample operator

New Altair Community Member

Dec 7, 2020

Updated Nov 5, 2024 by Jocelyn

Hi, sorry for the beginners question... I have a data set with 30,000 lines. The target variable is imbalanced : total false: 24000 / total true: 6000. So I have used the operator "sample" to balance it ( 1000 each) . At the end the performance classification operator gives the confusion matrix with only 2000 results ( from the sample). I was expecting the evaluation ( totals per TP/ TN/ FP/ FN) based on the total lines of the entire dataset ( 30,000 in total ) in order to evaluate costs as well ( on the performance costs operator ). What have I missed ? Maybe the issue is in the wrong lines used for the input/ outputs connectors ? Any tips where it can go wrong? I have tried many ways.... Thanks in advance for your help!

Find more posts tagged with

Sort by:

1 - 3 of 31

jacobcybulski

New Altair Community Member

Accepted Answer

Dec 7, 2020

As you selected only 2000 examples for model building and validation, this is what you get in the confusion matrix. However, since you use cost as a method of model evaluation, you can also use a cost sensitive model to deal with class imbalance, e. g. decision tree. I assume the cost if misclassifying the minority class is high (e. g. positive case, when representing fraud) and the cost of misclassifying the majority class is low (negative case). When cost structure is set up in this way, in model training, the importance of the majority class can be weighed down in favour of the minority class, thus overcoming the problem of class imbalance.

View in context

BalazsBaranyRM

New Altair Community Member

Accepted Answer

Dec 9, 2020

Another way to solve this is moving the sampling *into* the training phase of the cross validation. That way, you're building balanced models, but still validating on all data.
Also, sampling before the validation creates additional "knowledge" for the modeling process that you won't have later when applying the model.

Regards,
Balázs

View in context

AmsDani

New Altair Community Member

Accepted Answer

Dec 9, 2020

Thanks for your answers ! I will try it in this way you proposed Balázs!

View in context

🎉Community Raffle - Win $25

Newbie - expected performance output -after using the sample operator

Find more posts tagged with

Quick Links