[SOLVED] Training a classifier to have an else statement

Question

Hi everyone! I'm trying to train a classifier so that it can classify a data as true, false and "unclassified" depending on the data set. The output should be that if the data is within or is relation to the data set, it will be classified accordingly. If not, it should be classified as "unclassified". Currently i only have a process that can classify true/false. Is there a process in rapid miner in which i can include the unclassified class or do i need to add "unclassified" data into my data set for it to work? Our data is about health beliefs involving asthma and diabetes. Whenever it is not included in our data set, it should be unclassified. Sample data: Tweet Category diabetes causes heart problems false Diabetes causes shingles false Dirt treats Asthma false oatmeal medicates diabetes true Obesity causes Asthma true Obesity causes Diabetes true Thanks in advance! Yvan

yvncruz · Answer

Hello! Again, thank you for the help! With regards to applying the model (ie testing it with categorized data), will it still be able to drop uncertain data or do we have to add the same operator? Here is the process we use to test the model that we created. Again thank you! we will be using the example process you gave as a template for us to test different values for our parameters. Thank you! Best Regards, Yvan

MartinLiebig · Answer

I would go for Drop Uncertain in a cross validation. You can then have a loop or Optimize operator around and simply try out every cut on the confidences (and your SVM parameters). Attached is a process doing it on Sonar. It delivers the best performance and logs all performances for the differen values at drop uncertain. A cross-validation evaluating a decision tree model.

yvncruz · Answer

Hello,

Thank you for your help. I've read through link you gave me. I might be able to use the generate attribute using expressions to handle this problem.

Problems im going through right now is the values i should use as minimum and maximum confidence. The data set i'm trying to use is diverse compared to the data im trying to categorize (confidence values like 0.550 is still correct).

Also with regards to having three class, i'm not able to use 3 class since our thesis is focused on three classifiers (Naive, SVM). I tried using Naive bayes but it has confidence values of almost all of them as 1.

Thank you for you help! We're really new to rapid miner so your help is greatly appreciated.

Best,
Yvan