Clustering in rapidminer
Hello.!! I make a project in rapidminer and I 've got a question. My question is how can I find the representative consumer based in demographic data after having clustered the group of consumers with criterion the ratings in products.??? I will be waiting for some help.I appreciate it if someone could help me.!! 


Find more posts tagged with
Sort by:
1 - 10 of
101
Thank you very much... !!Your help is really important.....I want to ask something else.....It is a question....what products should propose to a "new" customer for which only knows the assessment for a given product. The only data is given to us is the assessments for the products by the users... I think we should do something with recommendation system...How can I use recommendation systems in rapidminer, if this is the right way???? 



Another solution I have thought is that we can see in which cluster is that product (the product which customer has assessed) and we can recommend the products which are in this cluster....???
One more question we did a classification and accuracy of classification is very low etc. 30%/+-15%, 50%/+-15% ... We have used naive bays, decision tree and K-nn but the accuracy is also low... What can we do to improve our model accuracy?????
One more question we did a classification and accuracy of classification is very low etc. 30%/+-15%, 50%/+-15% ... We have used naive bays, decision tree and K-nn but the accuracy is also low... What can we do to improve our model accuracy?????
Hello nicka,
of course you can analyse the cluster belongings. The question is how to find the "important" attributes. If you use the cluster_id as a label you can use weight by svm to find the key attributes.
For the classification problem. There are several typical things you do to optimize the performance:
0. Feature Generation and preprocessing - E.g. converting dates to useful numbers, calculating differences etc.
1. Feature Selection
2. Choosing the different algorithm. I would try for: SVM (with different Kernels), Random Forest, Neural Net, Linear Regression, Boosted decision Tree, LDA..
3. Optimizing the parameters of the algorithm (C for SVM is very very important).
As described by the CRISP-DM (http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining ) cycle it is a cykle. so you might turn back to the data again.
Data science is nothing like "do that and be happy". Good data science is kind of an art.
Can you share the data and/or the processes? Than someone might have a look on it and give more detailed tips.
Best,
Martin
of course you can analyse the cluster belongings. The question is how to find the "important" attributes. If you use the cluster_id as a label you can use weight by svm to find the key attributes.
For the classification problem. There are several typical things you do to optimize the performance:
0. Feature Generation and preprocessing - E.g. converting dates to useful numbers, calculating differences etc.
1. Feature Selection
2. Choosing the different algorithm. I would try for: SVM (with different Kernels), Random Forest, Neural Net, Linear Regression, Boosted decision Tree, LDA..
3. Optimizing the parameters of the algorithm (C for SVM is very very important).
As described by the CRISP-DM (http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining ) cycle it is a cykle. so you might turn back to the data again.
Data science is nothing like "do that and be happy". Good data science is kind of an art.
Can you share the data and/or the processes? Than someone might have a look on it and give more detailed tips.
Best,
Martin
Thank you..!!We have \downloaded (manually) from tripadvisor a number recent reviews of a particular hotel.We entered the data in an excel file for each review note if positive, negative or neutral based on the rating given by the user himself (negatively consider ratings with values 1-2, 3, 4-5 neutral and positive).
1. W should apply text processing functions that will lead to the largest possible reduction in the number of features (words) describing the vector reviews, 2 We should develop model classification which can rank (classify) the three categories new paradigms reviews (positive, negative, neutral) and evaluate the accuracy of classification by trying different algorithms. Which choice we should select for your recommendationsin order to optimize the performance of the model?????
1. W should apply text processing functions that will lead to the largest possible reduction in the number of features (words) describing the vector reviews, 2 We should develop model classification which can rank (classify) the three categories new paradigms reviews (positive, negative, neutral) and evaluate the accuracy of classification by trying different algorithms. Which choice we should select for your recommendationsin order to optimize the performance of the model?????
the clustering model contains a centeroid table. In this centeroid table you can see, what the center points of your cluster were. You might want to use them as representative (in the end the centeroid is the best representative of a cluster).
If you want to have something like "What is most the most important attribute for Cluster X?" you might use the Cluster-ID as label for a supervised learning algorithm and then do a standard feature selection.
Best,
Martin