🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Create own table statistic

User: "swas"
New Altair Community Member
Updated by Jocelyn

Hello,

i'm just getting started with Rapidminer and i'd like to ask a probably stupid question. But i'd like to ask if it so i get a better understanding.

I want to achieve something really simple:

I have a database with a table that i'm retrieving. Afterwards i select an attribute and i want to see if it's missing or not and with this i'd like to create a new result with the number of missing values, number of non missing values and the total number.

 

So this is a rather simple task to do in Rapidminer. And i sadly don't know how to achieve it. Or is it something i shouldn't do with Rapidminer?

 

I'd appreciate some thoughts.

 

 

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer

    Yes, this is very easy to do in RapidMiner.  First, if you simply want to see this information, you can get it from the "Statistics" view after you have imported your data.  That will show summary info for each attribute, including the number of missings, like so:

    stats view.PNG

    But if you want to generate a table with this information, you can do so easily by using "Generate Attribute" to count the missings and then "Aggregate" to summarize for any attribute, like so:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Titanic"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
    <list key="function_descriptions">
    <parameter key="Missing_Age" value="missing(Age)"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85">
    <list key="aggregation_attributes">
    <parameter key="Name" value="count"/>
    <parameter key="Missing_Age" value="count (percentage)"/>
    </list>
    <parameter key="group_by_attributes" value="Missing_Age"/>
    </operator>
    <connect from_op="Retrieve Titanic" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
    <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    And you could further use a loop to do this automatically for any number of attributes that you like.

    User: "MartinLiebig"
    Altair Employee
    Accepted Answer

    Hi,

     

    another way to do this is to use the "extract Statistics" operator which is included in the operator toolbox extension.

     

    Cheers,

    Martin