Hi guys,
I am new to Rapidminer, my first little project is clustering some of our customers. Attributes are like age, salary, job position and the amount of credit they have. In order to have a faster process, I have "translated" nominal data (like employement, channel) to numerical data.
I have run the clustering model, and except salary I got seemingly coherent data.
Here is my centroid table:
Attributes |
cluster_0 |
cluster_1 |
cluster_2 |
cluster_3 |
cluster_4 |
cluster_5 |
Channel |
1.0 |
0.0 |
1.0 |
1.0 |
3.0 |
0.0 |
Accomodation |
1.0 |
2.0 |
2.0 |
2.0 |
3.0 |
1.0 |
Education |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
Employement |
2.0 |
2.0 |
3.0 |
1.0 |
1.0 |
2.0 |
Credit (pcs) |
4.0 |
1.0 |
2.0 |
1.0 |
2.0 |
2.0 |
Age |
36.0 |
40.0 |
44.0 |
65.0 |
78.0 |
41.0 |
Salary |
3417.0 |
4174.0 |
3100.0 |
79.0 |
226.0 |
8601.0 |
Credit (vol) |
1181014.0 |
0.0 |
8690185.0 |
0.0 |
362750.0 |
3622658.0 |
What surprised me was the values in the Salary row - they should be much higher than that, the average salary in my data table for these customers are well above 188 k.
My question is: is there somethig I am missing or am I interpreteing the data wrong?
Thanks for the answers!
Tibor