Uneven distributed binominal data
Dear RM community,
I have a problem handling my dataset. I am trying to build a random forest model with a binominal label. The only prblem is, that the dataset contains 50 positives and 200 negatives. If all examples are predicted als false the accuracy is still quite OK (80%). And this is exactly what happens: Most models I get are predicting most as false.
So my question is, how to handle uneven distributed datasets. Is there for example a way to weight correct positives more than correct negatives negatives? A correct predicted postive should then be 200/50 times more valueable.
Cheers,
Markus
I have a problem handling my dataset. I am trying to build a random forest model with a binominal label. The only prblem is, that the dataset contains 50 positives and 200 negatives. If all examples are predicted als false the accuracy is still quite OK (80%). And this is exactly what happens: Most models I get are predicting most as false.
So my question is, how to handle uneven distributed datasets. Is there for example a way to weight correct positives more than correct negatives negatives? A correct predicted postive should then be 200/50 times more valueable.
Cheers,
Markus