Prediction outcome all shows no
Coralion
New Altair Community Member
Hi, I am new to RapidMiner.
I would like to predict Credit Payment 'yes or no' outcome with at least 2 method (e.g Naive Bayes/Decision Tree/k-mean etc.), however all of the prediction returns a no.
Unlike a usual yes/no this attribute has 3 values in the excel data: yes, no and unknown.
I am using Naive Bayes, decision tree and have set the attribute to label, polynominal for prediction.
Appreciate any help in advance.
Regards.
0
Best Answer
-
@Coralion,
In deed you have an imbalanced training set :
The outcome = Yes is your minority class.
Without preprocessing, given that the examples with outcome = Yes are in minority, your model has difficulties to "capture"
the relationships between the outcome = yes and your different regular attributes and thus is not able to predict correctly the outcome = Yes.
In these cases, for your business case, if the capacity of correctly predicting the outcome = Yes is important for you, you have to preprocess your training set by upsampling the minority class. It can be done by the SMOTE upsampling operator.
By doing this, you significantly increase the Recall (aka Sensitivity), ie the capacity of your model to correctly predict the outcome = Yes.
Example(s) of results here without/with preprocessing your training set :
You can find your process including the preprocessing step in attached file.
More generally for your business problem, you have to "quantify" what is important for you.
For that, you have to quantify 4 values :
- the (potential) gain when you correctly predict the Outcome = Yes (True positive cases)
- the (potential) gain when you correctly predict the Outcome = No (True negative cases)
- the (potential) cost when you incorrectly predict the Outcome = Yes (the real outcome is No) (False positive cases)
- the (potential) cost when you incorrectly predict the Outcome = No (the real outcome is Yes) (False negative cases)
By setting these 4 values, you are defining a "cost matrix". RapidMiner will also automatically build the model(s)
which will maximize the gain (and minimize the cost).
To do that, you have to submit your training set to AutoModel and define your "cost matrix" in the third menu ("Prepare Target").
I hope these elements will help you !
Regards,
Lionel
2
Answers
-
Hi @Coralion,
It's difficult to answer without analysing your data.
Maybe have you got a highly imbalanced training set ...??
Can you please share your data ?
Regards,
Lionel
2 -
Sure @lionelderkrikor,
As attached. There are some 'unknown' values for education and job attribute. Those could be the affecting factors.0 -
@Coralion,
In deed you have an imbalanced training set :
The outcome = Yes is your minority class.
Without preprocessing, given that the examples with outcome = Yes are in minority, your model has difficulties to "capture"
the relationships between the outcome = yes and your different regular attributes and thus is not able to predict correctly the outcome = Yes.
In these cases, for your business case, if the capacity of correctly predicting the outcome = Yes is important for you, you have to preprocess your training set by upsampling the minority class. It can be done by the SMOTE upsampling operator.
By doing this, you significantly increase the Recall (aka Sensitivity), ie the capacity of your model to correctly predict the outcome = Yes.
Example(s) of results here without/with preprocessing your training set :
You can find your process including the preprocessing step in attached file.
More generally for your business problem, you have to "quantify" what is important for you.
For that, you have to quantify 4 values :
- the (potential) gain when you correctly predict the Outcome = Yes (True positive cases)
- the (potential) gain when you correctly predict the Outcome = No (True negative cases)
- the (potential) cost when you incorrectly predict the Outcome = Yes (the real outcome is No) (False positive cases)
- the (potential) cost when you incorrectly predict the Outcome = No (the real outcome is Yes) (False negative cases)
By setting these 4 values, you are defining a "cost matrix". RapidMiner will also automatically build the model(s)
which will maximize the gain (and minimize the cost).
To do that, you have to submit your training set to AutoModel and define your "cost matrix" in the third menu ("Prepare Target").
I hope these elements will help you !
Regards,
Lionel
2