This post refers to
http://rapid-i.com/rapidforum/index.php/topic,368.0.html and
http://rapid-i.com/rapidforum/index.php/topic,369.0.html. It adresses the problems I experienced when trying to update models.
Given I created a wordlist and saved it to disk. Then I can use StringTextInput several times, each time loading and vectorizing only a part of the database texts. I want to give the word vectors to a learner that learns to classify texts. It should be a learner that produces an updatable model. I tried NaiveBayes.
Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?
Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.
I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.