calculating statistics

TheBear
TheBear New Altair Community Member
edited November 5 in Community Q&A
Hello
I am new to rapidminer. So far I am not quite sure if I have understand the concept
or the syntax corrctly.

My data set consists of some instances (I call it instances what is organised in lines in my spreadsheet)
and several attributes (columns). What I need to do is to condense the data,
e.g. calculating mean and deviation for attribute 3-7 (for each instance).
(For instance: Lets say I have a set of process parameters X describing my process and
I measure some output characteristics several times O1, O2 ,O3, O4 .
Now I want to investigate O further which is characterised by the mean O1-O4.)

I found the FeatureGeneration Operator which might be used for that purpose but the syntax is
not really easy to use (e.g. no function for mean or deviation).

Is there any other operator or operator chain which are better suited to receive statistics within instances?
Tagged:

Answers

  • steffen
    steffen New Altair Community Member
    Hello and welcome to RapidMiner

    I suggest to use the operator "Aggregation".
    Example: Calculating average of attribute "a" of the iris data set (available with RapidMiner), grouped by each value of the classlabel.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="iris.aml"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <parameter key="aggregation_attribute" value="a1"/>
            <parameter key="group_by_attribute" value="label"/>
            <parameter key="keep_example_set" value="false"/>
        </operator>
    </operator>
    In combination with operator "ParameterIteration" (use the cvs-version please) and the "ExampleSetJoinOperater" you can calculate the average for all attributes of a data set.

    hope this was helpful

    Steffen

    PS: I will add an Example for the second suggestion as soon as my cvs-update is complete  ;)



    ...which is not possible  :(
    @RapidMiner-Team:

    I got an
    java.lang.ClassCastException: com.rapidminer.parameter.ParameterTypeStringCategory cannot be cast to com.rapidminer.parameter.ParameterTypeCategory
    Here is the slightly changed setup, error occured while moving "aggregation_attribute" from "Parameters" to "SelectedParameters".
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="iris.aml"/>
        </operator>
        <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
            <list key="parameters">
            </list>
            <operator name="Aggregation" class="Aggregation">
                <parameter key="aggregation_attribute" value="a1"/>
                <parameter key="group_by_attribute" value="label"/>
                <parameter key="keep_example_set" value="false"/>
            </operator>
        </operator>
    </operator>
    I downloaded the cvs-version 40 minutes ago and ran the ant-build-script with default settings before starting the gui via RapidMinerGUI.bat
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    yes, the "Aggregation" operator (eventually in combination with the ExampleSetJoin) should be the solution. We just improved the Aggregation so that it can handle multiple groups and also multiple value attributes - even with different aggregation functions. Here is an example on the IRIS dataset caclulating the average for the four attributes:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="sample/data/iris.aml"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="a1" value="average"/>
              <parameter key="a2" value="average"/>
              <parameter key="a3" value="average"/>
              <parameter key="a4" value="average"/>
            </list>
            <parameter key="keep_example_set" value="false"/>
        </operator>
    </operator>
    You can now join the resulting example set with your original set if desired. By the way: we just made the release 4.2 so you would not need to access it via CVS. We will add the link to the new release on our website during the next hours.


    @Steffen:

    I just testet it myself but I didn't not get the class cast exception. Maybe there was some inconsistency in the CVS during the delay between developer and anonymous CVS. On the other hand, maybe the error came due to the changed parameters (see above) of this operator. Could you please try again in a few hours and check if this still happens?

    Thanks and cheers,
    Ingo
  • steffen
    steffen New Altair Community Member
    Hello

    ;D Yeah Release Time  ;D

    Well,with RapidMiner4.2 it is not possible to add aggregation_attributes as parameter for parameteriteration because it is a list of parameters. But this is ok.
    But it would be nice to remove all parameters from the ParameterIteration-Configuration-Dialog, which are not available for ParameterIteration (or mark them as such). Just to avoid confusion

    greetings

    Steffen
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    good idea. I will add it to our Todo list.

    Cheers,
    Ingo
  • TheBear
    TheBear New Altair Community Member
    Hi,

    I am not quite sure if I just didn't understand the function of the Aggregation operator or maybe
    I was not clear with my description. (Sorry I am not a native speaker...)

    What I want to do is to generate a new attribute (Average). Hence Rapidminer should
    compute the values for that attribute by calculating the mean from O1 till O3.
    Label             O1     O2      O3      Average 
    Instance 1         1       2       3           2
    Instance 2         1       3       2           2
    Instance 3         1       4       7           4
    Aggregation      1       3       4
    In my opinion Aggregation averages over one attribute and not for instances.
    Correct me please if I am wrong.
  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi,

    you are right. Aggregation means aggregating over attributes. Hence, a normal aggregation is not suitable for your need - at least not without a complicated process structure. As far as I know there is not operator which lets you directly average the values of some attribtutes. Nevertheless you can use the [tt]FeatureGeneration[/tt] operator and manually calculate the code. Suppose you want to average the three attribtues att1, att2 and att3. Then the corresponding XML code for averaging is

        <operator name="FeatureGeneration" class="FeatureGeneration">
            <list key="functions">
              <parameter key="average" value="/(+(att1,+(att2,att3)),const[3]())"/>
            </list>
            <parameter key="keep_all" value="true"/>
        </operator>
    I think we already plan a more sophisticated and more easy-to-use feature generation. Maybe we are even able to make this part of the next release.

    Regards,
    Tobias
  • TheBear
    TheBear New Altair Community Member
    Thanks Tobias.
    All right I ll wait till the next release :).

    I already used the FeatureGeneration but to be honest it is a bit of a pain to bring  it in the right syntax (especially for the deviation with ten or more attributs to be condensed).
    I have up to several hundreds attributs and I need average and deviation of certain groups of these attributs.
    (Not a big deal I ll precalculate these values in my spreadsheet.)

    Keep up the good work!