hi,
sorry for that maybe very stupid question, but I have some very basic question regarding learning the models.. how does it take place / what is the algorithm for learning based on the dataset? I know its always different for the different algorithms out there.
In instance-based methods like k-nn its quite easy, and I think I understand, it's just comparing new instances with already present instances in instance space, and vote for majority class, and new instances are learned by basically just "remembering" them and use them together with other instances for learning when new instances come in...
but how about Naive Bayes or SVM or decision trees?
In X-Validation, each training part is for learning a model, based on the instances in the training part, and then tested on the test-part... but what If the test part has very bad performance? like 10% accuracy, how is that part then being "applied", e.g "incorporated" into the trained-model to reach better performance for the test? I mean, after having trained the model, the model is finished and no further changes are made to it, especially there is no sub-sequent training that incorporates the test-part into it or is there sub-sequent training? furthermore, this would skew the test-performance, as the trained-part would have seen the test-part already in the training, or am I wrong?
my second question is: where can I see which algorithm uses attribute weighting? I tried to use weighting by "Generate Weight (Stratification)" operator because I have 3 labels and classes are imbalanced, 60%,30% and 10% prevalence, and then use the new weighted example set for LIBSVM and k-nn modeling, but it said they will make no use of it, why is that? I thought SVM could profit from balanced data?
I have methods for weighting in the test-round, but no found any good weighting methods for the training-round... balanced sampling is not a good solution, as I will have only a small dataset because my least often label has only 100 instances.... any ideas how to do this?