I have a set of news items (XML format) concerning the following categories (in Dutch): Auto, Economie, Politiek, Sport.
These XML items are read with the Read XML operator, resulting in an example set with Categorie as label attribute and Text and Title as regular attributes.
I apply Naive Bayes, Cross Validation and Performance operator and get funny performance results.
.png)
The imported XML content is classified by humans and should be accurate.
So what is going wrong? It looks like if I make a systematical error in my approach.
If I replace Bayes by k-NN, it gives the same performance results.
Who has some clues to resolve this?
I have attached the RM process and the XML data in a zip file.