A program to recognize and reward our most engaged community members
what´s the concept of confidence on text classification?
Ingo, is there any documentation available for helping understand each algorithm's definition of confidence? Thanks!
Jing
Dear Jing,
first of all: welcome to the community. There is no documentation on how our 250+ learners are calculating confidence. Most of the things are either readable in text books or in our code. Is there any operator in specific where we can help you?
~Martin
Here just look at the sampel
Copy from Help:
Note that in the testing set, the attributes of the first example are Outlook = sunny and Wind = false. Naive Bayes does calculation for all possible label values and selects the label value that has maximum calculated probability. Calculation for label = yes Find product of following: Posterior probability of label = yes (i.e. 9/14) value from distribution table when Outlook = sunny and label = yes (i.e. 0.223) value from distribution table when Wind = false and label = yes (i.e. 0.659) Thus the answer = 9/14*0.223*0.659 = 0.094 Calculation for label = no Find product of following: posterior probability of label = no (i.e. 5/14) value from distribution table when Outlook = sunny and label = no (i.e. 0.581) value from distribution table when Wind = false and label = no (i.e. 0.397) Thus the answer = 5/14*0.581*0.397= 0.082 As the value for label = yes is the maximum of all possible label values, label is predicted to be yes.
Note that in the testing set, the attributes of the first example are Outlook = sunny and Wind = false. Naive Bayes does calculation for all possible label values and selects the label value that has maximum calculated probability.
Calculation for label = yes
Find product of following:
Calculation for label = no
As the value for label = yes is the maximum of all possible label values, label is predicted to be yes.
And this ist how the confidence is calculated:
conf(yes) = 0.094/(0.094+0.082) = 0.534
conf(no) = 0.082/(0.094+0.082) = 0,465
Without round-off error you get: