"how to evaluate and compare two clustering method including k-mean"
Hi
I want to apply two clustering method including k-mean to my data and then compare them. Is there any way in rapidminer for copmaring clustering ?
Answers
-
Hi Soehill,
Yes, that's quite easy to do. You would just need a Multiply operator after your data set and then connect the different clustering algorithms to it. Make sure to then output all the Clustering algo ports. Of course, you can use a Write CSV operator to write out the results too.
Something like this perhaps?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"/>
<operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
<operator activated="true" class="k_means" compatibility="7.1.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="34"/>
<operator activated="true" class="x_means" compatibility="7.1.001" expanded="true" height="82" name="X-Means" width="90" x="380" y="136"/>
<connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
<connect from_op="X-Means" from_port="cluster model" to_port="result 3"/>
<connect from_op="X-Means" from_port="clustered set" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>0 -
Tnx but I hadn't any problem with applying algorithms. Actually I apply K-Mean, K-Medoid and DBScan and I saved the results. Now I want to compare these results with each other and I don't know which operator should I use !
I had found "cluster distance performance", "cluster density performance " and "item distribution performance". Which one is suitable for comparing K-Mean, K-Medoid and DBScan ?
Can I use Davies Bouldin ?
0 -
Two performance measures are supported by 'Cluster Distance Performance':
Average within cluster distance and
Davies-Bouldin index
.
0 -
And a quick help: You can use performance to data to make a example set from your performance vector. Afterwards it's easy to compare values with standard ETL tools.
~Martin
0