🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Strange behaviour of impute missing values component [Solved]"

User: "ammargh"
New Altair Community Member
Updated by Jocelyn
Using Rapidminer 5.3.015

I am trying to process missing values.
After retrieving the data I used a multiply component. One of the multiply component's output is used as an input to the impute missing values component and a second output is connected to the process res port.

After running the process missing values both before and after the impute missing values were replaced !!.


This is strange because the original data should not be changed !!!

(Edited:  Same results with RM Studio 6.0.3)

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
       <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
     </operator>
     <operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Multiply" width="90" x="179" y="165"/>
     <operator activated="true" class="impute_missing_values" compatibility="5.3.015" expanded="true" height="60" name="Impute Missing Values" width="90" x="447" y="255">
       <parameter key="attribute" value="class"/>
       <process expanded="true">
         <operator activated="true" class="k_nn" compatibility="5.3.015" expanded="true" height="76" name="k-NN" width="90" x="601" y="30">
           <parameter key="k" value="5"/>
         </operator>
         <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
         <connect from_op="k-NN" from_port="model" to_port="model sink"/>
         <portSpacing port="source_example set source" spacing="0"/>
         <portSpacing port="sink_model sink" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
     <connect from_op="Multiply" from_port="output 1" to_op="Impute Missing Values" to_port="example set in"/>
     <connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
     <connect from_op="Impute Missing Values" from_port="example set out" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Marco_Boeck"
    New Altair Community Member
    Hi,

    while most operators work on a view of the data, i.e. do not modify the underlying data, some do. This is sort of a mixture between internal restrictions and a bug. You can work around this by adding a "Materialize Data" operator after the "Multiply" for the connection which should return the original example set. See the following example process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.006">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.006" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="6.0.006" expanded="true" height="60" name="Retrieve Labor-Negotiations" width="90" x="45" y="75">
           <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
         </operator>
         <operator activated="true" class="multiply" compatibility="6.0.006" expanded="true" height="94" name="Multiply" width="90" x="246" y="75"/>
         <operator activated="true" class="materialize_data" compatibility="6.0.006" expanded="true" height="76" name="Materialize Data" width="90" x="380" y="30"/>
         <operator activated="true" class="impute_missing_values" compatibility="6.0.006" expanded="true" height="60" name="Impute Missing Values" width="90" x="379" y="120">
           <process expanded="true">
             <operator activated="true" class="k_nn" compatibility="6.0.006" expanded="true" height="76" name="k-NN" width="90" x="112" y="30"/>
             <connect from_port="example set source" to_op="k-NN" to_port="training set"/>
             <connect from_op="k-NN" from_port="model" to_port="model sink"/>
             <portSpacing port="source_example set source" spacing="0"/>
             <portSpacing port="sink_model sink" spacing="0"/>
           </process>
         </operator>
         <connect from_op="Retrieve Labor-Negotiations" from_port="output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Materialize Data" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Impute Missing Values" to_port="example set in"/>
         <connect from_op="Materialize Data" from_port="example set output" to_port="result 1"/>
         <connect from_op="Impute Missing Values" from_port="example set out" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    Regards,
    Marco
    User: "ammargh"
    New Altair Community Member
    OP
    Thank you very much.