"How to get the standard deviation from clustered data?"

stever1k
stever1k New Altair Community Member
edited November 5 in Community Q&A
Hi,

after clustering my data, the data has the following format:

id A B C Cluster
a x y z  0
.. .... .... 1
.. .... .... 1
.. .... .... 2
.. .... .... 0
.. .... ....
.. .... .... N

So the cluster algorithm found several clusters and created a new column with the attribute cluster. I now want to calculate the standard deviation for Cluster 0 for the attributes A B and C, the same for cluster 1 up to N. Any ideas how this works?

cordially,
Stever

Answers

  • land
    land New Altair Community Member
    Hi Stever,
    this is a typical situation for using the aggregation operator. You can group the examples by the cluster and then calculate a aggregation function over each attribute. I have done this in this process:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="gaussian mixture clusters"/>
        </operator>
        <operator name="KMeans" class="KMeans">
            <parameter key="k" value="3"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="cluster"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="att1" value="standard_deviation"/>
              <parameter key="att2" value="standard_deviation"/>
              <parameter key="att3" value="standard_deviation"/>
              <parameter key="att4" value="standard_deviation"/>
              <parameter key="att5" value="standard_deviation"/>
            </list>
            <parameter key="group_by_attributes" value="cluster"/>
        </operator>
    </operator>
    It should be easy to adapt it onto your needs.

    Greetings,
      Sebastian
  • stever1k
    stever1k New Altair Community Member
    thanks a lot Sebastian, that is EXACTLY what I'm looking for. My problem was, that I was searching for suitable operator inside the preprocession->attributres tree instead of the olap!

    best regards,
    Stever