🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"how to evaluate and compare two clustering method including k-mean"

soheil008User: "soheil008"
New Altair Community Member
Updated by Jocelyn

Hi

I want to apply two clustering method including k-mean to my data and then compare them. Is there any way in rapidminer for copmaring clustering ?

Find more posts tagged with

Sort by:
1 - 4 of 41

    Hi Soehill,

     

    Yes, that's quite easy to do. You would just need a Multiply operator after your data set and then connect the different clustering algorithms to it. Make sure to then output all the Clustering algo ports. Of course, you can use a Write CSV operator to write out the results too.

     

    Something like this perhaps?

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"/>
    <operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
    <operator activated="true" class="k_means" compatibility="7.1.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="34"/>
    <operator activated="true" class="x_means" compatibility="7.1.001" expanded="true" height="82" name="X-Means" width="90" x="380" y="136"/>
    <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/>
    <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
    <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
    <connect from_op="X-Means" from_port="cluster model" to_port="result 3"/>
    <connect from_op="X-Means" from_port="clustered set" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    Tnx but I hadn't any problem with applying algorithms. Actually I apply K-Mean, K-Medoid and DBScan and I saved the results. Now I want to compare these results with each other and I don't know which operator should I use !

     

    I had found "cluster distance performance", "cluster density performance " and "item distribution performance". Which one is suitable for comparing K-Mean, K-Medoid and DBScan ?

    Can I use Davies Bouldin ?

    Two performance measures are supported by 'Cluster Distance Performance':

    Average within cluster distance and

    Davies-Bouldin index

    davies-bouldin.PNG.

    And a quick help: You can use performance to data to make a example set from your performance vector. Afterwards it's easy to compare values with standard ETL tools.

     

     

    ~Martin