How to compare data before and after missing values handling?

EdisonLee
EdisonLee New Altair Community Member
edited November 2024 in Community Q&A

Dear everyone, 

 

I'm learning RapidMiner using a NBA dataset from data.world. I noticed that there are missing data in the 3P% column. The way I filterd out these 11 rows was clicking missing_attritubes in the top-right. 

螢幕快照 2018-02-24 14.29.50.png

 

So I used Raplace Missing Values to set missing data to 0. The process worked successfully but what I want to know is: How could I show only these 11 rows after replacing missing to 0? Because after replacing, I can't filter data by selecting missing_attritubes. 

 

Can anyone help me on this case? I've been stucked for several days... Do I need to do any change in my process or there are other solutions? 

 

My process: 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
<parameter key="repository_entry" value="//PredictNBARookie/Data/nba_logreg"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="replace_missing_values" compatibility="8.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="3P%"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
</process>

Thanks in advance!

Best, Lee

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi @EdisonLee,

     

    I used the Generate Attribute operator to create a copy of your attribute 3P% named 3P% back_up.

    and then I used the Join Operator to join this created attribute to your dataset.

    Here the results after filtering : 

    NBA_missing_value.pnglolol

    You can find the process here : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\NBA_missing_values\nba_logreg.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Name.true.polynominal.attribute"/>
    <parameter key="1" value="GP.true.integer.attribute"/>
    <parameter key="2" value="MIN.true.real.attribute"/>
    <parameter key="3" value="PTS.true.real.attribute"/>
    <parameter key="4" value="FGM.true.real.attribute"/>
    <parameter key="5" value="FGA.true.real.attribute"/>
    <parameter key="6" value="FG%.true.real.attribute"/>
    <parameter key="7" value="3P Made.true.real.attribute"/>
    <parameter key="8" value="3PA.true.real.attribute"/>
    <parameter key="9" value="3P%.true.real.attribute"/>
    <parameter key="10" value="FTM.true.real.attribute"/>
    <parameter key="11" value="FTA.true.real.attribute"/>
    <parameter key="12" value="FT%.true.real.attribute"/>
    <parameter key="13" value="OREB.true.real.attribute"/>
    <parameter key="14" value="DREB.true.real.attribute"/>
    <parameter key="15" value="REB.true.real.attribute"/>
    <parameter key="16" value="AST.true.real.attribute"/>
    <parameter key="17" value="STL.true.real.attribute"/>
    <parameter key="18" value="BLK.true.real.attribute"/>
    <parameter key="19" value="TOV.true.real.attribute"/>
    <parameter key="20" value="TARGET_5Yrs.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="8.1.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="246" y="34">
    <list key="columns">
    <parameter key="3P%" value="zero"/>
    </list>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="187">
    <list key="function_descriptions">
    <parameter key="3P%_back_up" value="[3P%]"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="187">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="3P%_back_up"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="187"/>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="849" y="34">
    <list key="key_attributes"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this process answer to your need ?

     

    Regards,

     

    Lionel

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi @EdisonLee,

     

    I used the Generate Attribute operator to create a copy of your attribute 3P% named 3P% back_up.

    and then I used the Join Operator to join this created attribute to your dataset.

    Here the results after filtering : 

    NBA_missing_value.pnglolol

    You can find the process here : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\NBA_missing_values\nba_logreg.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Name.true.polynominal.attribute"/>
    <parameter key="1" value="GP.true.integer.attribute"/>
    <parameter key="2" value="MIN.true.real.attribute"/>
    <parameter key="3" value="PTS.true.real.attribute"/>
    <parameter key="4" value="FGM.true.real.attribute"/>
    <parameter key="5" value="FGA.true.real.attribute"/>
    <parameter key="6" value="FG%.true.real.attribute"/>
    <parameter key="7" value="3P Made.true.real.attribute"/>
    <parameter key="8" value="3PA.true.real.attribute"/>
    <parameter key="9" value="3P%.true.real.attribute"/>
    <parameter key="10" value="FTM.true.real.attribute"/>
    <parameter key="11" value="FTA.true.real.attribute"/>
    <parameter key="12" value="FT%.true.real.attribute"/>
    <parameter key="13" value="OREB.true.real.attribute"/>
    <parameter key="14" value="DREB.true.real.attribute"/>
    <parameter key="15" value="REB.true.real.attribute"/>
    <parameter key="16" value="AST.true.real.attribute"/>
    <parameter key="17" value="STL.true.real.attribute"/>
    <parameter key="18" value="BLK.true.real.attribute"/>
    <parameter key="19" value="TOV.true.real.attribute"/>
    <parameter key="20" value="TARGET_5Yrs.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="8.1.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="246" y="34">
    <list key="columns">
    <parameter key="3P%" value="zero"/>
    </list>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="187">
    <list key="function_descriptions">
    <parameter key="3P%_back_up" value="[3P%]"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="187">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="3P%_back_up"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="187"/>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="849" y="34">
    <list key="key_attributes"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this process answer to your need ?

     

    Regards,

     

    Lionel

  • EdisonLee
    EdisonLee New Altair Community Member

    Hi @lionelderkrikor

     

    Thank you for helping me. This is a very nice way to achieve my goal. I can easily understand how you did that. But I don't know why I couldn't let your process run on my computer. How should I connect operators? 

    螢幕快照 2018-02-24 19.37.03.png

     

    Thanks, 

    Lee

     

     

  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    HI @EdisonLee,

     

    It's weird, it's seems that the Join operator is considered as "deprecated" by RapidMiner.

    Try the following operations : 

     - Delete this Join operator.

     - Search the Join operator thanks to the operator search box.

     - Drag and drop the Join operator in the process window.

     - Connect manually the Join operator to the two Generate ID operators.

     

    I hope it helps,

     

    Best regards,

     

    Lionel

     

     

  • EdisonLee
    EdisonLee New Altair Community Member

    Dear @lionelderkrikor

     

    The process worked after I followed your instructions. Your solution really solves my question. Thanks again to give me different thought to do data processing in RapidMiner. :smileyhappy:

     

    Best Regards, 

    Lee

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.