How can I improve the performance of my model with an imbalanced database for a classification issue
Hi,
This is my fist time using RapidMiner. I have to do a classification for an assignment.
The database is really imbalanced. I have 180 out of 12800 donors who donated (class - 1) in the past and the remaining donors didn't donated (class - 0).
When I created and selected relevant attributes, the class precisions were relevant but the class recall for class 1 was totally irrelevant. I had something close to 8%.
However, when I used the 'Sample' operator to balance my database, the class recall and the class precision were around 60%. I am not sure if it is the right thing to do because at the end, I end up with 360 donors instead of 12 800.
At the end, I have to use a test set of more than 12 000 donors to predict which donor will donate.
Thank you
NB: My kappa is equal to 0.267
This is my fist time using RapidMiner. I have to do a classification for an assignment.
The database is really imbalanced. I have 180 out of 12800 donors who donated (class - 1) in the past and the remaining donors didn't donated (class - 0).
When I created and selected relevant attributes, the class precisions were relevant but the class recall for class 1 was totally irrelevant. I had something close to 8%.
However, when I used the 'Sample' operator to balance my database, the class recall and the class precision were around 60%. I am not sure if it is the right thing to do because at the end, I end up with 360 donors instead of 12 800.
At the end, I have to use a test set of more than 12 000 donors to predict which donor will donate.
Thank you
NB: My kappa is equal to 0.267