"De-normalizing K-means Centroids"

DancingSheep
DancingSheep New Altair Community Member
edited November 5 in Community Q&A
Hello,

I'm using k-means and I've got a problem.
I need to cluster some data after normalizing it, but then I would like to see the centroids as if they were from the de-normalized set.

I've already seen this topic http://rapid-i.com/rapidforum/index.php/topic,3613.msg13557.html#msg13557, but it doesn't work. While the data set gets de-normalized, the centroids stay the same.

Here is my code so far:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="521" width="681">
      <operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="/Users/GiO/Desktop/csv/AlmostFull.csv"/>
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
        <parameter key="name" value="favgame"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="polynominal"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.1.006" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="30">
        <parameter key="attribute_filter_type" value="no_missing_values"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="default" value="zero"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="5.1.006" expanded="true" height="94" name="Normalize" width="90" x="45" y="165"/>
      <operator activated="true" class="k_means" compatibility="5.1.006" expanded="true" height="76" name="Clustering" width="90" x="179" y="165">
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="true" class="denormalize" compatibility="5.1.006" expanded="true" height="76" name="De-Normalize" width="90" x="45" y="300"/>
      <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Normalize" from_port="preprocessing model" to_op="De-Normalize" to_port="model input"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="De-Normalize" from_port="model output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="252"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • Andrew2
    Andrew2 New Altair Community Member
    Hello

    Extract the cluster centroids and then apply the de-normalising step to the output from this.

    regards

    Andrew
  • DancingSheep
    DancingSheep New Altair Community Member
    I'm sorry, it seems I didn't understand what you mean.
    Could you provide the code for the new connections?

    Thanks
  • Andrew2
    Andrew2 New Altair Community Member
    Here you go...

    Andrew
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="386" width="882">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.1.006" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
          <operator activated="true" class="denormalize" compatibility="5.1.006" expanded="true" height="76" name="De-Normalize" width="90" x="313" y="210"/>
          <operator activated="true" class="k_means" compatibility="5.1.006" expanded="true" height="76" name="Clustering" width="90" x="313" y="30">
            <parameter key="k" value="3"/>
          </operator>
          <operator activated="true" class="extract_prototypes" compatibility="5.1.006" expanded="true" height="76" name="Extract Cluster Prototypes" width="90" x="514" y="30"/>
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Denormalised original data with clusters and labels" width="90" x="514" y="210">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Denormalised cluster prototypes" width="90" x="715" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Normalize" from_port="preprocessing model" to_op="De-Normalize" to_port="model input"/>
          <connect from_op="De-Normalize" from_port="model output" to_op="Denormalised original data with clusters and labels" to_port="model"/>
          <connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Denormalised original data with clusters and labels" to_port="unlabelled data"/>
          <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Denormalised cluster prototypes" to_port="unlabelled data"/>
          <connect from_op="Denormalised original data with clusters and labels" from_port="labelled data" to_port="result 2"/>
          <connect from_op="Denormalised original data with clusters and labels" from_port="model" to_op="Denormalised cluster prototypes" to_port="model"/>
          <connect from_op="Denormalised cluster prototypes" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • DancingSheep
    DancingSheep New Altair Community Member
    This is exactly what I was looking for!

    Thanks for your help!
  • Rizwan
    Rizwan New Altair Community Member
    Hi ,
    @awchisholm
    I have similar queries. I use normalization operator and got the decent regression model performance. when i de-normalized it and check the performance, i found, it is same as model trained without using normalization operator. I think ideally it should give better result than the model trained without normalization.