Why does Naive Bayes return a confidence either 0 or 1 for every sample?
fstarsinic
New Altair Community Member
I'm just guessing but is this telling me that there is some attribute the algorithm is keying on and discarding everything else? Is there a way to take the results and look at the predictions + the other attributes together in a correlation matrix to see if that is the case? I can't picture that with NB. Seems more of an NN kinda thing or a tree thing.
Anyway, 0 and 1 only?... that can't be a good sign. What does that indicate?
Anyway, 0 and 1 only?... that can't be a good sign. What does that indicate?
Tagged:
0
Answers
-
Is there a way to take the results and look at the predictions + the other attributes together in a correlation matrix to see if that is the case
If I understand this correctly, you want to find a correlation between predicted output and regular attributes used in model. If so, yes you can use correlation matrix operator and connect it to the "exa" port of performance operator to correlation matrix and select "include special attribute" option in correlation matrix operator.
Also, what does performance metrics indicate? Is this model predicting with high accuracy?
Do let us know if you need more info.0 -
thank you. the results are not awful. many predictions make me happy so that's good. the predictions are making sense, as I would expect. I have a VERY unbalanced dataset so some of the stats are not that meaningful.1
-
this is what looks odd to me. only a few test samples here but always the same regardless of sample size. confidence (predicting 0 or 1) is always either 0% or 100%. Seems likely something is wrong.
0 -
Are you sure it is always 0 and 1? I see some of them are less than 1 and greater than zero based on this image. Can you check the Data View instead of the statistics view or you can open charts?0
-
Yes all 0s or 1s for confidence with nothing else. I checked the data. here's a sample of it.
the vertical axis above is the number of samples. the horizontal axis shows the different confidence values (only 2)
0 -
Oh, so the algorithm is not learning about class 0. How are your precision values for class zero? This some times happens to highly imbalanced datasets where algorithm just pushes everything to class with high samples.0
-
Well there are only 2 classes so if it's learning about class 1 wouldn't it follow that it was automatically learning about class 0?
0 -
What I mean by not learning is?. In the case of naive Bayes, it assumes that all attributes are independent of each other (this sometimes works and sometimes doesn't). If your data has complex interactions between attributes that add more information to the model, naive Bayes fail to find these things as it works based on conditional independence (one attribute and another are not connected). When it fails to learn, these algorithms will predict all or most samples as the majority class (1 in your case). I guess your data is a case where this algorithm principle fails. Imbalance data also has a huge effect on these algorithms.
Machine learning is also based on No free lunch theorem. We never know exactly which algorithm fits our data, which is the reason we try to apply multiple models.1 -
fstarsinic you said "I have a VERY unbalanced dataset" have you done all the preprocessing and sampling before applying your learner? If you didn´t do that the model is biased since it easier to predict the class that has more values on your DataSet so it need your help.
You may see how this affects and how you could solve it on this videos.
https://academy.rapidminer.com/learn/video/sampling-weighting-intro
https://academy.rapidminer.com/learn/video/sampling-weighting-demo
and for the Naive
https://academy.rapidminer.com/learn/video/naive-bayes-intro
https://academy.rapidminer.com/courses/nave-bayes-demo
2