🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Some help for training a regression algorithm [SOLVED]"

User: "manwann"
New Altair Community Member
Updated by Jocelyn
Hi dear rapid-i community,

I am testing the rapidminer modeling to make a content-based recommender system. To do that i downloaded the movielens 100K dataset which has information about movies and ratings made by users to movies (http://www.grouplens.org/node/73). The ratings have a range between 0 and 5 and the movies has genre information (action, commedy, etc). I am training a classifier using the user with more ratings  (uid= 405; Number of reviews= 737). To do that I discretize the rating label (good >= 3.5; bad < 3.5) but due that the user has a lot of more reviews with label bad the classifier (libSVM) predicts all labels as bad.

                           true bad                true good                class precision
pre.bad              621                          116                             84.26%
pre.good           0                                0                                  0%
class recall       100%                 0%

So  i used another strategy where I made stratified sampling (http://rapid-i.com/rapidforum/index.php/topic,2190.0.html) to get good and bad labels balanced. I get the following results

                           true bad                true good                class precision
pre.bad              58                           80                              42.03%
pre.good            57                           35                             38.04%
class recall        50.43%                 30.43%


But as you can see the performance obtained is still not good, i really appreciate any suggestion.

Thanks.

Eduardo

Edit: Sorry for the replicated message

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "MariusHelf"
    New Altair Community Member
    Stratified sampling is usually a good idea in cases like this. But now you have only quite a few training examples left, which is of course bad for the performance. Next, the performance of the SVM depends heavily on good choices for the parameters (espacially C, and in case of the rbf kernel Gamma), and on the kernel you use (good choices are often linear and rbf/radial).
    To optimize them, use an Optimize Parameters (Grid) operator. Good ranges for both C and Gamma are something like 10^-5 - 10^5 on a logarithmic scale.

    Best, Marius
    User: "manwann"
    New Altair Community Member
    OP
    Marius thanks for your answer!

    At least now is best  follow the classifier prediction :) (instead of doing the opposite ) The results were

    accuracy 59.13% +/- 7.33%
                                      true bad                        true good                     class precission
    pred.bad                        86                                65                                    56.95%
    pred.good                     29                                 50                                   63.29%
    class recall                  74.78%                        43.48%                        

    Maybe I have to tray the movielens1m dataset.

    Thanks again.