nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

Naive Bayesian models

AndyV

Hi,
As I understand it, the weight given to a descriptor in a naive Bayesian model is proportional to the enrichment of that descriptor in the "active" or "good" set compared with the "bad" or "inactive" set. I would like to know how descriptors with only a very few instances in the training set are treated. With the approach described, you would end up with certainties one way or the other often (or in the extreme case of only one instance, all the time). Are these simply discarded?
thanks for any enlightenment,
Andy

Find more posts tagged with

AI Studio

Accepted answers

All comments

land

Hi Andy,
you somehow seem to confuse the algorithms. Naive Bayes does not calculate any weights. Naive Bayes assumes an independence of all attributes, so that they are all exactly of the same weight. And it does not have any bad or good set. And what do you mean by descriptor? Now I'm confused by your question

Greetings,
Sebastian

AndyV

Apologies for lack of clarity. What I have is a training set of 50000 members, each member being described by 1024 "descriptors" defining the presence or absence of a chemical structural feature. All 1024 features are treated as independent and, to begin with, all have equal weights. Then the training data is queried. I have members classified in 2 categories : active and inactive. The presence of the descriptor in the molecule sets a bit at a certain position to 1 and the absence sets it to 0. So I have a 2D matrix e.g.:

category
member1 110001001.... active
member2 0110000100....active
member3 110001001... active

member4 001100000.. inactive
member5 001000000.. inactive

So in this case, the structural feature represented by bit number 2 is enriched in the active members compared with inactive so the presence of this feature in any future chemical I see should weight that chemical to the active category. The size of that weight is (as I understand it) proportional to the enrichment in the active category compared with inactive and these weights are then used to categorise unseen compounds. Is this right?
If so, how are bits with very rare instances treated?

land

Hi,
yes I think this is correct. I would have used different terms, but it somehow comes down to this, I think. But the different weight is not linearly, but comes from the proportion of two nomal distribution densities...But however, I don't know what you mean with "very rare instances". Undersampled classes? Attributes (that's how your descriptors are called within RapidMiner) with only very few 1 and the rest 0s? In the latter case they aren't treated at all, because NaiveBayes does not differ between them.

Greetings,
Sebastian

AndyV

So if, say, attribute 3 had two bits set to 1 in the active training set and all set to zero in the inactive training set wouldn't it appear certain that this attribute was associated with activity? Which seems to be overly certain

haddock

G'Day,

It is not a major intellectual breakthrough to spot that learning will not be brilliant when training and test sets are completely different. So your real point is?

AndyV

Laplacian-Modified Naive Bayesian models as it turns out. I am aware that my point is no intellectual breakthrough but it relates to a problem that I might well encounter with the data I am using, and the LMNB is designed to deal with it automatically. In order to use any software, it helps to know how it will deal with particular features of my data and this is what I was looking to clarify. Below is an extract from a paper which met the same problem and how they dealt with 1/0 probabilities. I'm interested to know if similar feature is in Rapidminer

"Such a situation might arise, for example, in the case of under-represented bits. Suppose that a given feature occurs only once in a given data set and for a compound in the training set for which the hypothesis is false (e.g., likely to be absorbed in the intestine). The resulting probability that the hypothesis would be true for any test compound having this feature would be 0. (In our trivial example, this would lead to the rather absurd conclusion that no compounds containing the feature will be absorbed in the intestine.) A Laplacian estimator is therefore applied by adding a value of 1 to each Pr[Ei|H] in the numerator and a value of N to the denominator, where N is the total number of pieces of evidence. This gives each E which occurs with a frequency of 0 a small, nonzero value"

haddock

Hi again,

I think that is what the Laplace correction is there for, you add a liitle to the top and bottom to prevent zero probabilities. The code is in SimpleDistributionModel.java.

AndyV

Thank you. Exactly what I was looking for. And which I had failed to find searching on "Laplacian" in the documentation

haddock

Hi Andy,

RM has many virtues, documentation is not one of them! Luckily there is a forum for the Brave...

land

Yes,
if I wouldn't spend so much time here in the forum, we would have a much better documentation and much less user. So there's always a trade-off

But it's the bare truth: We could use a few additional hands down here...

By the way, know I understand what's your problem was...

Greetings,
Sebastian

AndyV

...and thanks Sebastian for your help. I will do better in framing my questions next time!

fabian_preis

Hi,

I´m trying to get into the topic of this Model but i can´t find a good introduction to this topic. Does anyone know a good Tutorial,Video or Literature which explains this model for beginner? In German or English?

I´m analysing speeches to analyse the ton of the text. I want to compare the dictionary methode with the Naive Bayes model. The dictionary Model should work but i have no idea how to handel the other one or what is the main different.

MartinLiebig

Dear Fabian,

what exactly are you interested in? In the naive bayes classifier or text mining? Naive Bayes is a standard technique for classification and is explained in most text books.

For text mining in RapidMiner my old friend @MariusHelf recommended this blog post: http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html

~Martin

fabian_preis

Dear Martin,

thank you for your help. My topic is text mining. I try to compare the results from the dictionary part with the text mining of naive bayes to compare both. I have a couple of textes and i try to find out how many positiv, neutral and negativ words are in it. I will have a look to the tutorials so far.

sgenzer

hi...if you're looking for a textbook-type help, you may find this book helpful. It has a whole section on Naive Bayes with screenshots from RapidMiner.

https://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/

Scott