calculating statistics
TheBear
New Altair Community Member
Hello
I am new to rapidminer. So far I am not quite sure if I have understand the concept
or the syntax corrctly.
My data set consists of some instances (I call it instances what is organised in lines in my spreadsheet)
and several attributes (columns). What I need to do is to condense the data,
e.g. calculating mean and deviation for attribute 3-7 (for each instance).
(For instance: Lets say I have a set of process parameters X describing my process and
I measure some output characteristics several times O1, O2 ,O3, O4 .
Now I want to investigate O further which is characterised by the mean O1-O4.)
I found the FeatureGeneration Operator which might be used for that purpose but the syntax is
not really easy to use (e.g. no function for mean or deviation).
Is there any other operator or operator chain which are better suited to receive statistics within instances?
I am new to rapidminer. So far I am not quite sure if I have understand the concept
or the syntax corrctly.
My data set consists of some instances (I call it instances what is organised in lines in my spreadsheet)
and several attributes (columns). What I need to do is to condense the data,
e.g. calculating mean and deviation for attribute 3-7 (for each instance).
(For instance: Lets say I have a set of process parameters X describing my process and
I measure some output characteristics several times O1, O2 ,O3, O4 .
Now I want to investigate O further which is characterised by the mean O1-O4.)
I found the FeatureGeneration Operator which might be used for that purpose but the syntax is
not really easy to use (e.g. no function for mean or deviation).
Is there any other operator or operator chain which are better suited to receive statistics within instances?
Tagged:
0
Answers
-
Hello and welcome to RapidMiner
I suggest to use the operator "Aggregation".
Example: Calculating average of attribute "a" of the iris data set (available with RapidMiner), grouped by each value of the classlabel.<operator name="Root" class="Process" expanded="yes">
In combination with operator "ParameterIteration" (use the cvs-version please) and the "ExampleSetJoinOperater" you can calculate the average for all attributes of a data set.
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="iris.aml"/>
</operator>
<operator name="Aggregation" class="Aggregation">
<parameter key="aggregation_attribute" value="a1"/>
<parameter key="group_by_attribute" value="label"/>
<parameter key="keep_example_set" value="false"/>
</operator>
</operator>
hope this was helpful
Steffen
PS: I will add an Example for the second suggestion as soon as my cvs-update is complete
...which is not possible
@RapidMiner-Team:
I got an
Here is the slightly changed setup, error occured while moving "aggregation_attribute" from "Parameters" to "SelectedParameters".java.lang.ClassCastException: com.rapidminer.parameter.ParameterTypeStringCategory cannot be cast to com.rapidminer.parameter.ParameterTypeCategory <operator name="Root" class="Process" expanded="yes">
I downloaded the cvs-version 40 minutes ago and ran the ant-build-script with default settings before starting the gui via RapidMinerGUI.bat
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="iris.aml"/>
</operator>
<operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
<list key="parameters">
</list>
<operator name="Aggregation" class="Aggregation">
<parameter key="aggregation_attribute" value="a1"/>
<parameter key="group_by_attribute" value="label"/>
<parameter key="keep_example_set" value="false"/>
</operator>
</operator>
</operator>
0 -
Hi,
yes, the "Aggregation" operator (eventually in combination with the ExampleSetJoin) should be the solution. We just improved the Aggregation so that it can handle multiple groups and also multiple value attributes - even with different aggregation functions. Here is an example on the IRIS dataset caclulating the average for the four attributes:
You can now join the resulting example set with your original set if desired. By the way: we just made the release 4.2 so you would not need to access it via CVS. We will add the link to the new release on our website during the next hours.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="sample/data/iris.aml"/>
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
</operator>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="a1" value="average"/>
<parameter key="a2" value="average"/>
<parameter key="a3" value="average"/>
<parameter key="a4" value="average"/>
</list>
<parameter key="keep_example_set" value="false"/>
</operator>
</operator>
@Steffen:
I just testet it myself but I didn't not get the class cast exception. Maybe there was some inconsistency in the CVS during the delay between developer and anonymous CVS. On the other hand, maybe the error came due to the changed parameters (see above) of this operator. Could you please try again in a few hours and check if this still happens?
Thanks and cheers,
Ingo0 -
Hello
;D Yeah Release Time ;D
Well,with RapidMiner4.2 it is not possible to add aggregation_attributes as parameter for parameteriteration because it is a list of parameters. But this is ok.
But it would be nice to remove all parameters from the ParameterIteration-Configuration-Dialog, which are not available for ParameterIteration (or mark them as such). Just to avoid confusion
greetings
Steffen
0 -
Hi,
good idea. I will add it to our Todo list.
Cheers,
Ingo0 -
Hi,
I am not quite sure if I just didn't understand the function of the Aggregation operator or maybe
I was not clear with my description. (Sorry I am not a native speaker...)
What I want to do is to generate a new attribute (Average). Hence Rapidminer should
compute the values for that attribute by calculating the mean from O1 till O3.
In my opinion Aggregation averages over one attribute and not for instances.Label O1 O2 O3 Average Instance 1 1 2 3 2 Instance 2 1 3 2 2 Instance 3 1 4 7 4 Aggregation 1 3 4
Correct me please if I am wrong.0 -
Hi,
you are right. Aggregation means aggregating over attributes. Hence, a normal aggregation is not suitable for your need - at least not without a complicated process structure. As far as I know there is not operator which lets you directly average the values of some attribtutes. Nevertheless you can use the [tt]FeatureGeneration[/tt] operator and manually calculate the code. Suppose you want to average the three attribtues att1, att2 and att3. Then the corresponding XML code for averaging is
I think we already plan a more sophisticated and more easy-to-use feature generation. Maybe we are even able to make this part of the next release.
<operator name="FeatureGeneration" class="FeatureGeneration">
<list key="functions">
<parameter key="average" value="/(+(att1,+(att2,att3)),const[3]())"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
Regards,
Tobias0 -
Thanks Tobias.
All right I ll wait till the next release .
I already used the FeatureGeneration but to be honest it is a bit of a pain to bring it in the right syntax (especially for the deviation with ten or more attributs to be condensed).
I have up to several hundreds attributs and I need average and deviation of certain groups of these attributs.
(Not a big deal I ll precalculate these values in my spreadsheet.)
Keep up the good work!
0