Cluster-Analysis with wholesale customer dataset
Hello everyone,
as a group of marketing students who participate in a course called "Marketing Analytics", we now have the task to make a cluster-analysis, using different clustering-methods, on the dataset from here:
https://archive.ics.uci.edu/ml/datasets/wholesale+customers
The exact description is the following:
"The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories. Goal: Find Clusters of Customers"
For that, we should try out different Clustering methods (Professor told us next to k-means to try out DBSCAN and Hierachical Clustering)
Currently we did the following:
Added Operator: Read CSV -> Loaded in the Data-Set
Added Operator: Select Attributes -> Filtered out the nominal attributes Channel & Region
Added Operator: K-Means
First off we do not know how to find the optimum of "k" to use in RapidMiner? How can we get to this, how can we see the intradistance and so the "Ellbow" graph in rapid miner for this dataset? (I attached a graphic from a presentation i found)
As we have more than 2 attributes (Milk, Frozen, Fresh, Delicatess, Groceries, etc.) how can we visualize the clusters? What kind of clusters can we get out of this dataset?
Also, how can we use the DBSCAN Clustering ? If we just connect it with the Select Attributes operator and run it, we get only one cluster...
Our professor also told us to use some loop, is it also necessary to filter out Outliners?
Please help, we struggle a lot in this task. If someone is able to explain this task, he or she can also contact me private and I would offer something for the effort.
Thanks a lot!!