🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Calculating average with missing values"

iesnaolaUser: "iesnaola"
New Altair Community Member
Updated by Jocelyn

Hi everyone,

 

Having a Dataset like the following one:

 

row - Att1 - Att2 - Att3

(1)     5        7         5     

(2)     6        8         9     

(3)               7         6

(4)               7         

 

I want to calculate the average so that result would be the following one:

row - Att1 - Att2 - Att3 - Average

(1)     5        7         5         5.6

(2)     6        8         9         7.6

(3)               7         6         6.5

(4)               7                    7

 

I am using operator Generate Attributes, with the following function:

avg(Att1, Att2, Att3)

However, when any of the rows has a missing value, I simply get a Missing Value as Average.

 

I would like to know how to solve this problem.

 

Thanks in advance

         

Find more posts tagged with

Sort by:
1 - 3 of 31

    Hi,

     

    the generate aggregation operator is able to do it.

     

    Best,

    Martin

    HI @iesnaola,

     

    You have to use the Generate Aggregation operator and check the parameter ignore missing

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Average_missing\Average_missing.xlsx"/>
    <parameter key="imported_cell_range" value="A1:C5"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att1.true.integer.attribute"/>
    <parameter key="1" value="Att2.true.integer.attribute"/>
    <parameter key="2" value="Att3.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="8.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="447" y="34">
    <parameter key="attribute_name" value="average"/>
    <parameter key="aggregation_function" value="average"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards, 

     

    Lionel

     

    Thanks! It worked :)