Join example sets with a loop

User: "felix_w"
New Altair Community Member
Updated by Jocelyn

Dear Rapidminer Community, 

 

I have looked through several posts about loops and joins in the forum already but I haven't found what I am looking for. 

 

What I am trying to do: 

I have 7 example sets which can be joined with the "join" operator by an outer-join using "date" as join attribute. Instead of building a process where I join all the example sets manually I would like to do this with a loop. This loop should simply go through the example sets and join them together to one big file, is this possible? I have gone through the accessible loop operators but haven't found a solution because if I put the "join" operator in a loop, the operator of course needs two inputs to join something together.

How do I handle this? Is there an operator which can do this? 

 

Best regards 

Felix

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "BalazsBaranyRM"
    New Altair Community Member
    Accepted Answer

    Hi,

     

    please try this example.

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="remember" compatibility="8.1.000" expanded="true" height="68" name="Remember" width="90" x="246" y="34">
    <parameter key="name" value="Current set"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Retrieve Iris (2)" width="90" x="112" y="136">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Retrieve Iris (3)" width="90" x="112" y="238">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Retrieve Iris (4)" width="90" x="112" y="340">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Retrieve Iris (5)" width="90" x="112" y="442">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="collect" compatibility="8.1.000" expanded="true" height="145" name="Collect" width="90" x="380" y="136"/>
    <operator activated="true" class="loop_collection" compatibility="8.1.000" expanded="true" height="68" name="Loop Collection" width="90" x="514" y="136">
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="8.1.000" expanded="true" height="68" name="Recall" width="90" x="112" y="34">
    <parameter key="name" value="Current set"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="313" y="85">
    <parameter key="remove_double_attributes" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <operator activated="true" class="remember" compatibility="8.1.000" expanded="true" height="68" name="Remember (2)" width="90" x="514" y="85">
    <parameter key="name" value="Current set"/>
    </operator>
    <connect from_port="single" to_op="Join" to_port="right"/>
    <connect from_op="Recall" from_port="result" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Remember (2)" to_port="store"/>
    <portSpacing port="source_single" spacing="84"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="recall" compatibility="8.1.000" expanded="true" height="68" name="Final set" width="90" x="648" y="136">
    <parameter key="name" value="Current set"/>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Remember" to_port="store"/>
    <connect from_op="Retrieve Iris (2)" from_port="output" to_op="Collect" to_port="input 1"/>
    <connect from_op="Retrieve Iris (3)" from_port="output" to_op="Collect" to_port="input 2"/>
    <connect from_op="Retrieve Iris (4)" from_port="output" to_op="Collect" to_port="input 3"/>
    <connect from_op="Retrieve Iris (5)" from_port="output" to_op="Collect" to_port="input 4"/>
    <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Final set" from_port="result" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Like the one in the other thread, this process uses Remember and Recall. You "Remember" the first example set and group the others to a Collection. Then in the Loop Collection you recall the current state of the joined example set, join it with the new one, and remember the new state. At the end you recall the final set. 

     

    Regards,

    Balázs