How to label clusters in output?
mauricenew
New Altair Community Member
Hey,
I am using k-means to cluster a dataset but want to have the "prediction" (the cluster is based on) as a label outputted. Right now it gives me my different IDs and their word-content plus "cluster_02" or "cluster_17" but it would be nice to see "word x" instead of "cluster_02" so I can easily filter in excel for stuff I was searching for. To sum up: instead of a name like "cluster_02" I want a name based on the highest probilitiy. i.e. a cluster contains phrases containing the word "ananas", the label should be "ananas" and not "cluster_02".
Question 2: why is "nominal to text" deleting all numbers instead of converting them to text?
I am using k-means to cluster a dataset but want to have the "prediction" (the cluster is based on) as a label outputted. Right now it gives me my different IDs and their word-content plus "cluster_02" or "cluster_17" but it would be nice to see "word x" instead of "cluster_02" so I can easily filter in excel for stuff I was searching for. To sum up: instead of a name like "cluster_02" I want a name based on the highest probilitiy. i.e. a cluster contains phrases containing the word "ananas", the label should be "ananas" and not "cluster_02".
Question 2: why is "nominal to text" deleting all numbers instead of converting them to text?
Tagged:
0
Answers
-
For the first item, there is no automated way for RapidMiner to do what you are asking because the cluster identity is not necessarily based around only one specific word. You need to look at the cluster performance metrics like the centroid values carefully to determine what distinguishes one cluster from another. Then you might decide on your own descriptive names, and you can then rename the clusters using the Map or Replace operators.
For the second item, it sounds like you are using the wrong operator. If you are trying to turn numbers into text, you want "Numerical to Nominal" and not "Nominal to Text".1 -
I see, makes sense! Thank you! Ill look into it!
Regarding 2): well usually numbers also occur inside of text, its a mix of text and numbers like "yes we have the modellnumber xyz23 on stock". so far "23" etc is always deleted and wont appear as tokenized items (leftover is: xyz") I use "read from excel" as the source
read from excel -> nominal to text -> process documents from data
process documents from data needs text as input, thats why im doing that but it removes all numbers
Update: got it, had to switch to "linguistic tokens" instead of "non-letters" inside the tokenize operator
Upate2: but now it takes ages to just to "Process documents fro data"-> tokenize...is there a faster way? "non-letter" was more or less instantly done1