Hi all,
I see that there are already some discussions in this community about this subject. However I still have some doubts.
I have a process, in which there is a class imbalance and the minority class is the most important. SMOTE upsampling seems to provide good results. I say "seems" because I have doubts on how to correctly validate it.
My approach was to train the model with upsampled data and test the model with 20% hold out (partitioned before upsampling).
I guess that this is the most correct thing to do 'cause real data is not upsampled. But what is the most correct way to validate the model? I used the 20% hold out in the testing part of CV operator (using remember and recall).
What are your thoughts?
Please trash my approach if you think so
(enclosed a mock example data set and RM process file)
Thanks,
Pedro