"how to evaluate and compare two clustering method including k-mean"

soheil008 · July 2016

Hi

I want to apply two clustering method including k-mean to my data and then compare them. Is there any way in rapidminer for copmaring clustering ?

Thomas_Ott · July 2016

Hi Soehill,

Yes, that's quite easy to do. You would just need a Multiply operator after your data set and then connect the different clustering algorithms to it. Make sure to then output all the Clustering algo ports. Of course, you can use a Write CSV operator to write out the results too.

Something like this perhaps?

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"/>
      <operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
      <operator activated="true" class="k_means" compatibility="7.1.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="34"/>
      <operator activated="true" class="x_means" compatibility="7.1.001" expanded="true" height="82" name="X-Means" width="90" x="380" y="136"/>
      <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
      <connect from_op="X-Means" from_port="cluster model" to_port="result 3"/>
      <connect from_op="X-Means" from_port="clustered set" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

soheil008 · July 2016

Tnx but I hadn't any problem with applying algorithms. Actually I apply K-Mean, K-Medoid and DBScan and I saved the results. Now I want to compare these results with each other and I don't know which operator should I use !

I had found "cluster distance performance", "cluster density performance " and "item distribution performance". Which one is suitable for comparing K-Mean, K-Medoid and DBScan ?

Can I use Davies Bouldin ?

YYH · July 2016

Two performance measures are supported by 'Cluster Distance Performance':

Average within cluster distance and

Davies-Bouldin index

.

MartinLiebig · July 2016

And a quick help: You can use performance to data to make a example set from your performance vector. Afterwards it's easy to compare values with standard ETL tools.

~Martin

"how to evaluate and compare two clustering method including k-mean"

Answers

Categories