🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Some help for training a regression algorithm [SOLVED]"

User: "manwann"
New Altair Community Member
Updated by Jocelyn
Hi dear rapid-i community,

I am testing the rapidminer modeling to make a content-based recommender system. To do that i downloaded the movielens 100K dataset which has information about movies and ratings made by users to movies (http://www.grouplens.org/node/73). The ratings have a range between 0 and 5 and the movies has genre information (action, commedy, etc). I am training a classifier using the user with more ratings  (uid= 405; Number of reviews= 737). To do that I discretize the rating label (good >= 3.5; bad < 3.5) but due that the user has a lot of more reviews with label bad the classifier (libSVM) predicts all labels as bad.

                           true bad                true good                class precision
pre.bad              621                          116                             84.26%
pre.good           0                                0                                  0%
class recall       100%                 0%

So  i used another strategy where I made stratified sampling (http://rapid-i.com/rapidforum/index.php/topic,2190.0.html) to get good and bad labels balanced. I get the following results

                           true bad                true good                class precision
pre.bad              58                           80                              42.03%
pre.good            57                           35                             38.04%
class recall        50.43%                 30.43%


But as you can see the performance obtained is still not good, i really appreciate any suggestion.

Thanks.

Eduardo

Edit: Sorry for the replicated message

Find more posts tagged with