An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Hi
I want to apply two clustering method including k-mean to my data and then compare them. Is there any way in rapidminer for copmaring clustering ?
Hi Soehill,
Yes, that's quite easy to do. You would just need a Multiply operator after your data set and then connect the different clustering algorithms to it. Make sure to then output all the Clustering algo ports. Of course, you can use a Write CSV operator to write out the results too.
Something like this perhaps?
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="7.1.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"/> <operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/> <operator activated="true" class="k_means" compatibility="7.1.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="34"/> <operator activated="true" class="x_means" compatibility="7.1.001" expanded="true" height="82" name="X-Means" width="90" x="380" y="136"/> <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/> <connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/> <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/> <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/> <connect from_op="X-Means" from_port="cluster model" to_port="result 3"/> <connect from_op="X-Means" from_port="clustered set" to_port="result 4"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> <portSpacing port="sink_result 5" spacing="0"/> </process> </operator></process>
Tnx but I hadn't any problem with applying algorithms. Actually I apply K-Mean, K-Medoid and DBScan and I saved the results. Now I want to compare these results with each other and I don't know which operator should I use !
I had found "cluster distance performance", "cluster density performance " and "item distribution performance". Which one is suitable for comparing K-Mean, K-Medoid and DBScan ?
Can I use Davies Bouldin ?
Two performance measures are supported by 'Cluster Distance Performance':
Average within cluster distance and
Davies-Bouldin index
.
And a quick help: You can use performance to data to make a example set from your performance vector. Afterwards it's easy to compare values with standard ETL tools.
~Martin