"Calculate performance only on TRUE cases??"
Hello,
First off, I want to say thank you for this great software. I LOVE RapidMiner!!!
On to my question...
We are looking at creating an SVM for detecting positive indications of a medical condition.
We have training data that is labled "true" and "false" along with all the features. (True examples are those where the person has the medical condition. They represent about 20% of the training data.)
When attempting a grid parameter function or a feature selection function we are seeing a problem with finding an ideal result.
WE DON'T CARE ABOUT THE NEGATIVE OR "FALSE" CASES. We only care about the accuracy of the "true" cases.
The problem is that the accuracy performance measure is the average of accuracy for BOTH cases (true and false.) For example, if we just predict everything as false, since 80% of of our examples are false, then we automatically have 40% accuracy, but ZERO correct predictions for the class we care about.
*** I guess what we ultimately want to do is train a SINGLE CLASS SVM that is focused on predicting the true class as accurately as possible. ****
So we don't need a performance scored based on the aggregate accuracy of the model, but ONLY ON THE ACCURACY OF THE "TRUE" PREDICTIONS.
One thought was to use class weighting in either the SVM or classification performance steps, but how much? and which to use?
Another thought was to use some creative application of the meta-cost function, but how would we incorporate that with the libsvm function??
Is this possible in RM?
Any and all ideas would be appreciated.
First off, I want to say thank you for this great software. I LOVE RapidMiner!!!
On to my question...
We are looking at creating an SVM for detecting positive indications of a medical condition.
We have training data that is labled "true" and "false" along with all the features. (True examples are those where the person has the medical condition. They represent about 20% of the training data.)
When attempting a grid parameter function or a feature selection function we are seeing a problem with finding an ideal result.
WE DON'T CARE ABOUT THE NEGATIVE OR "FALSE" CASES. We only care about the accuracy of the "true" cases.
The problem is that the accuracy performance measure is the average of accuracy for BOTH cases (true and false.) For example, if we just predict everything as false, since 80% of of our examples are false, then we automatically have 40% accuracy, but ZERO correct predictions for the class we care about.
*** I guess what we ultimately want to do is train a SINGLE CLASS SVM that is focused on predicting the true class as accurately as possible. ****
So we don't need a performance scored based on the aggregate accuracy of the model, but ONLY ON THE ACCURACY OF THE "TRUE" PREDICTIONS.
One thought was to use class weighting in either the SVM or classification performance steps, but how much? and which to use?
Another thought was to use some creative application of the meta-cost function, but how would we incorporate that with the libsvm function??
Is this possible in RM?
Any and all ideas would be appreciated.
