Naive bayes vs Naive bayes(kernel)

Thiru
Thiru New Altair Community Member
edited November 5 in Community Q&A
hi all,
My data set contains numerical values, which are configured as data type " real".  Im able to use both operators naive bayes as well as Naive Bayes(kernel) type., with slightly different performance.   However, I also see in RM documentation, only Naive bayes(kernel) to be used for numeric attribute.   
should I consider only NB(kernel) result, enventhough rapidminer accepts using normal Naive bayes operator too? or
both are acceptable for numercial attribute?

regds
thiru

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi Thiru,

    the difference between stock NB and NB (kernel) is the way numeric attributes are put into the model. You can easily compare this when looking at the model output charts.

    Naive Bayes (which can be used with numeric attributes) just assumes that the numerical inputs are normally distributed, calculates the parameters of this normal distribution, and uses it for assigning likelihoods to classes. You see two (or more) Gaussian curves in the model.

    Naive Bayes (kernel) instead tries to fit a smoothed curve to the actual values. Therefore you can change some numeric parameters. If your attribute values don't follow a normal distribution, this can better fit them, so the prediction will be better, at the cost of a longer calculation time and more complex models (even with the danger of overfitting in some conditions).

    If you find a good set of parameters for you use case and cross validate correctly, both will give you results you can rely on. Depending on your use case, you might want to select the variant giving better results, or the simpler model. 

    Regards,
    Balázs

Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi Thiru,

    the difference between stock NB and NB (kernel) is the way numeric attributes are put into the model. You can easily compare this when looking at the model output charts.

    Naive Bayes (which can be used with numeric attributes) just assumes that the numerical inputs are normally distributed, calculates the parameters of this normal distribution, and uses it for assigning likelihoods to classes. You see two (or more) Gaussian curves in the model.

    Naive Bayes (kernel) instead tries to fit a smoothed curve to the actual values. Therefore you can change some numeric parameters. If your attribute values don't follow a normal distribution, this can better fit them, so the prediction will be better, at the cost of a longer calculation time and more complex models (even with the danger of overfitting in some conditions).

    If you find a good set of parameters for you use case and cross validate correctly, both will give you results you can rely on. Depending on your use case, you might want to select the variant giving better results, or the simpler model. 

    Regards,
    Balázs
  • Thiru
    Thiru New Altair Community Member
    @BalazsBarany  , thanks for your reply. this clarifies.