Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Clustering by variable
deuorrior
Hi everyone!
I'm working on a group project with Rapidminer and my classmates and I are trying to divide our data into some clusters, but we don't know how to chose the variable to do the clustering since it seems like Rapidminer automatically uses the one of the first column of the dataset we use.
We wanted to define them by frequency but in the screenshots you can see the results we actually got.
Can anyone please help us sort out how to proceed if for instance we want to create these clusters by frequency?
Find more posts tagged with
AI Studio
Accepted answers
BalazsBaranyRM
Hi!
RapidMiner uses all attributes with most clustering algorithms, e. g. k-Means.
It's a good idea to remove the ID from processing by the clustering operator by using Set Role and setting its role to "id". That way it won't be considered for the distances that determine the clustering.
For k-Means and other distance based algorithms it's a good idea to use Normalize if you have numeric attributes on different scales. Otherwise, the attribute with the largest values will dominate the distance.
Regards,
Balázs
All comments
BalazsBaranyRM
Hi!
RapidMiner uses all attributes with most clustering algorithms, e. g. k-Means.
It's a good idea to remove the ID from processing by the clustering operator by using Set Role and setting its role to "id". That way it won't be considered for the distances that determine the clustering.
For k-Means and other distance based algorithms it's a good idea to use Normalize if you have numeric attributes on different scales. Otherwise, the attribute with the largest values will dominate the distance.
Regards,
Balázs
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups