Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

how does learning a model take place?

hi,

sorry for that maybe very stupid question, but I have some very basic question regarding learning the models.. how does it take place / what is the algorithm for learning based on the dataset? I know its always different for the different algorithms out there.

In instance-based methods like k-nn its quite easy, and I think I understand, it's just comparing new instances with already present instances in instance space, and vote for majority class, and new instances are learned by basically just "remembering" them and use them together with other instances for learning when new instances come in...

but how about Naive Bayes or SVM or decision trees?

In X-Validation, each training part is for learning a model, based on the instances in the training part, and then tested on the test-part... but what If the test part has very bad performance? like 10% accuracy, how is that part then being "applied", e.g "incorporated" into the trained-model to reach better performance for the test? I mean, after having trained the model, the model is finished and no further changes are made to it, especially there is no sub-sequent training that incorporates the test-part into it or is there sub-sequent training? furthermore, this would skew the test-performance, as the trained-part would have seen the test-part already in the training, or am I wrong?

my second question is: where can I see which algorithm uses attribute weighting? I tried to use weighting by "Generate Weight (Stratification)" operator because I have 3 labels and classes are imbalanced, 60%,30% and 10% prevalence, and then use the new weighted example set for LIBSVM and k-nn modeling, but it said they will make no use of it, why is that? I thought SVM could profit from balanced data?

I have methods for weighting in the test-round, but no found any good weighting methods for the training-round... balanced sampling is not a good solution, as I will have only a small dataset because my least often label has only 100 instances.... any ideas how to do this?

Find more posts tagged with

AI Studio

Accepted answers

bhupendra_patil

hi @Fred12

I will try and answer few of your questions.

As far as algorithms go, these are standard algo's, the math and how it exaclty learns is something that may be available on non-rapidminer sources.

@IngoRM does a great job at explaining a few of them in his 5 minute series.

https://www.youtube.com/playlist?list=PLssWC2d9JhOZZ6PCzJt2L2zUwA3RozrP_

As far as cross validation goes, there is no improvement happening,

it is just "honest" training and testing, where in each iteration the training data and test data have no overlap.

if it is 10% accurate, it will drop the overall performance, since at the end the performance reported is average performance.

Again here is Ingo's video explaining it really well

https://www.youtube.com/watch?v=gDYvLvzPXG0&index=9&list=PLssWC2d9JhOZZ6PCzJt2L2zUwA3RozrP_

Also here is a bonus beta access to you, I beleive you are talking about row weights and not attributes weight.

http://mod.rapidminer.com/

select your column type, target type and advanced option (uses row weights) and see what algorithms you can use

Edit: Typo

All comments