Newbie - expected performance output -after using the sample operator
Hi, sorry for the beginners question... I have a data set with 30,000 lines. The target variable is imbalanced : total false: 24000 / total true: 6000. So I have used the operator "sample" to balance it ( 1000 each) . At the end the performance classification operator gives the confusion matrix with only 2000 results ( from the sample). I was expecting the evaluation ( totals per TP/ TN/ FP/ FN) based on the total lines of the entire dataset ( 30,000 in total ) in order to evaluate costs as well ( on the performance costs operator ). What have I missed ? Maybe the issue is in the wrong lines used for the input/ outputs connectors ? Any tips where it can go wrong? I have tried many ways.... Thanks in advance for your help!