Create own table statistic

swas
swas New Altair Community Member
edited November 2024 in Community Q&A

Hello,

i'm just getting started with Rapidminer and i'd like to ask a probably stupid question. But i'd like to ask if it so i get a better understanding.

I want to achieve something really simple:

I have a database with a table that i'm retrieving. Afterwards i select an attribute and i want to see if it's missing or not and with this i'd like to create a new result with the number of missing values, number of non missing values and the total number.

 

So this is a rather simple task to do in Rapidminer. And i sadly don't know how to achieve it. Or is it something i shouldn't do with Rapidminer?

 

I'd appreciate some thoughts.

 

 

Best Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    Yes, this is very easy to do in RapidMiner.  First, if you simply want to see this information, you can get it from the "Statistics" view after you have imported your data.  That will show summary info for each attribute, including the number of missings, like so:

    stats view.PNG

    But if you want to generate a table with this information, you can do so easily by using "Generate Attribute" to count the missings and then "Aggregate" to summarize for any attribute, like so:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Titanic"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
    <list key="function_descriptions">
    <parameter key="Missing_Age" value="missing(Age)"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85">
    <list key="aggregation_attributes">
    <parameter key="Name" value="count"/>
    <parameter key="Missing_Age" value="count (percentage)"/>
    </list>
    <parameter key="group_by_attributes" value="Missing_Age"/>
    </operator>
    <connect from_op="Retrieve Titanic" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
    <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    And you could further use a loop to do this automatically for any number of attributes that you like.

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    Hi,

     

    another way to do this is to use the "extract Statistics" operator which is included in the operator toolbox extension.

     

    Cheers,

    Martin

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    Yes, this is very easy to do in RapidMiner.  First, if you simply want to see this information, you can get it from the "Statistics" view after you have imported your data.  That will show summary info for each attribute, including the number of missings, like so:

    stats view.PNG

    But if you want to generate a table with this information, you can do so easily by using "Generate Attribute" to count the missings and then "Aggregate" to summarize for any attribute, like so:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Titanic"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
    <list key="function_descriptions">
    <parameter key="Missing_Age" value="missing(Age)"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85">
    <list key="aggregation_attributes">
    <parameter key="Name" value="count"/>
    <parameter key="Missing_Age" value="count (percentage)"/>
    </list>
    <parameter key="group_by_attributes" value="Missing_Age"/>
    </operator>
    <connect from_op="Retrieve Titanic" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
    <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    And you could further use a loop to do this automatically for any number of attributes that you like.

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    Hi,

     

    another way to do this is to use the "extract Statistics" operator which is included in the operator toolbox extension.

     

    Cheers,

    Martin

  • Telcontar120
    Telcontar120 New Altair Community Member
    That's a great operator, but unfortunately it doesn't give the total number of examples or the number of non-missings either, so it won't get exactly what the OP asked for. But that might be a nice enhancement for a future version of the "Extract Statistics" operator :-)