"Naive Bayes for Text Classification"
Hello,
I'm trying to apply naive bayes to classifiy some texts and I have two questions about how rapidminer (v5.0.13) implement this classifier:
1.- As far as I know, one of the most frequently used classifier applied to text classification is multinomial naive bayes. The model obtained when using the naive bayes operator is composed by a set of means and standard deviations for the words of my corpus... So, which kind of naive bayes classifier is implemented in rapidminer (Multinomial, Gaussian, Bernouilli)?
2.- I have seen several examples of text classification applying naive bayes in rapidminer. Some of them uses the TF-IDF matrix as input when creating the model and when applying the model. I understand that TF-IDF values are used to make the model. However, I suppose that TF-IDF values are not used when applying the model (It would not make sense)... In fact, the "process documents" operator receive a Word List as input that modifies the "apply model" output. So,
a) Is it relevant how texts are vectorized (tf, tfidf, term occurrences) when applying naive bayes model?
b) Why does "process documents" operator receive a Word List, and how it is used when applying the model?
Thank you in advance.