"Help With Cluster Output"
hgwelec
New Altair Community Member
Hello to Rapid-I Team,
One quick question :
I have a dataset which consists of Age, Number of children, Income etc. I am trying to run K-means through the dataset and everything works ok. However i would like to get the following format :
Cluster 0 : Age 22.5, Income : 1225, Children 0.25
Cluster 1 : Age 34.2,Income : 2300,Children : 2
Can RM output such information or it can provide just centroid distances???
Thanks!
One quick question :
I have a dataset which consists of Age, Number of children, Income etc. I am trying to run K-means through the dataset and everything works ok. However i would like to get the following format :
Cluster 0 : Age 22.5, Income : 1225, Children 0.25
Cluster 1 : Age 34.2,Income : 2300,Children : 2
Can RM output such information or it can provide just centroid distances???
Thanks!
Tagged:
0
Answers
-
Hi,
unfortunately I do not really understand what the problem is here. When I run [tt]KMeans[/tt] I get a lot of information including the a cluster centroid table like the one you want to see. But I do not see any information about distances. The information like the one above is contained in the Centroid Table view of the [tt]ClusterModel[/tt].hgwelec wrote:
I have a dataset which consists of Age, Number of children, Income etc. I am trying to run K-means through the dataset and everything works ok. However i would like to get the following format :
Cluster 0 : Age 22.5, Income : 1225, Children 0.25
Cluster 1 : Age 34.2,Income : 2300,Children : 2
Can RM output such information or it can provide just centroid distances???
Kind regards,
Tobias0 -
Tobias,
Thanks. I just can'tt output the ClusterCentroidModel because it gets "consumed" somewhere. Here is my XML setup
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="D:\MyDocuments\Analyzer\data-numeric-.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="KMeans" class="KMeans">
<parameter key="k" value="3"/>
<parameter key="max_runs" value="5"/>
</operator>
<operator name="ClusterModelWriter" class="ClusterModelWriter">
<parameter key="cluster_model_file" value="D:\Programs\Rapid-I\rm_workspace\cluster.clm"/>
</operator>
<operator name="ClusterCentroidEvaluator" class="ClusterCentroidEvaluator">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
<operator name="ClusterModelReader" class="ClusterModelReader">
<parameter key="cluster_model_file" value="D:\Programs\Rapid-I\rm_workspace\cluster.clm"/>
</operator>
</operator>
Is this the way to do it?
Many Thanks0 -
Hi,
You can check this yourself by clicking on the operators in the operator tree and then pressing F1. In the operator help dialog the inputs and outputs are listed. Another way is to use breakpoints in the process and inspect the intermediate results.hgwelec wrote:
Thanks. I just can'tt output the ClusterCentroidModel because it gets "consumed" somewhere. Here is my XML setup
In principal, your process setup is right. You can however use the [tt]IOMultiplier[/tt] alternatively, which allows you to generate a copy of an object before one of this will be consumed. Another way would be to use the [tt]IOStorer[/tt] - [tt]IORetriever[/tt] mechanism, which does not require the object being saved to disk.
Kind regards,
Tobias0 -
That is great.
Thank you very much Tobias0