Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Altair RapidMiner
"Resampling / oversampling with holdout sample."
lansuminc
Hi guys.
I have a question regarding resampling / oversampling i combination with the use of a holdout sample
My dataset is the following:
Positive cases: 337
Negative cases: 2661
What i did until now was:
1) Sample 337 positive cases and sample 1500 negative cases
2) Then i filter 0's in on node and filter 1's in another node
3) I use sample bootstrapping one the 1's with a factor of 4.451 giving me 1500 positive cases.
4) I append the datasets
5) I am ready to model
Now I want to use a holdout sample as my linear SVM seems to be overfitting. 90-95% accuracy.
What i consider the right thing, is to extract lets say 37 positive cases and 37 negative cases to use for validation BEFORE upscaling the minority class. this leaves me with a holdout sample on evenly distributed 74 (i know it is small, but i am mining text so I need my training cases). It also leaves me with a training and test set on 300/1500 which i can upscale to 1500/1500 cases.
My SVM predicts almost all the negative cases correctly and 2/3 of the positive cases if i use feature extraction on the hold out sample.
What are you thoughts?
Are there other ways to use holdout sample in rapidminer?
Find more posts tagged with
AI Studio
Sampling
Comments
There are no comments yet
Quick Links
All Categories
Recent Discussions
Activity
My Discussions
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups