How can I after I've split my dataset join them together again in the same order as befo

Prentice · April 2019

Hi people,

How can I after I've split my dataset join them together again in the same order as before?
Below is an example, I just want the ID to go from 1 to 150 again in a chronological order. But just sorting the ID doesn't work.
Also, how can I change the order of attributes in the results?

<pre class="CodeBlock"><code><?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"><br>&nbsp; <context><br>&nbsp;&nbsp;&nbsp; <input/><br>&nbsp;&nbsp;&nbsp; <output/><br>&nbsp;&nbsp;&nbsp; <macros/><br>&nbsp; </context><br>&nbsp; <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"><br>&nbsp;&nbsp;&nbsp; <parameter key="logverbosity" value="init"/><br>&nbsp;&nbsp;&nbsp; <parameter key="random_seed" value="2001"/><br>&nbsp;&nbsp;&nbsp; <parameter key="send_mail" value="never"/><br>&nbsp;&nbsp;&nbsp; <parameter key="notification_email" value=""/><br>&nbsp;&nbsp;&nbsp; <parameter key="process_duration_for_mail" value="30"/><br>&nbsp;&nbsp;&nbsp; <parameter key="encoding" value="SYSTEM"/><br>&nbsp;&nbsp;&nbsp; <process expanded="true"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="repository_entry" value="//Samples/data/Iris"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </operator><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="split_data" compatibility="9.2.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <enumeration key="partitions"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="ratio" value="0.66"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="ratio" value="0.34"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </enumeration><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="sampling_type" value="automatic"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="use_local_random_seed" value="false"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <parameter key="local_random_seed" value="1992"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </operator><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <operator activated="true" class="union" compatibility="9.2.001" expanded="true" height="82" name="Union" width="90" x="313" y="34"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Split Data" from_port="partition 1" to_op="Union" to_port="example set 1"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Split Data" from_port="partition 2" to_op="Union" to_port="example set 2"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <connect from_op="Union" from_port="union" to_port="result 1"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <portSpacing port="source_input 1" spacing="0"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <portSpacing port="sink_result 1" spacing="0"/><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <portSpacing port="sink_result 2" spacing="0"/><br>&nbsp;&nbsp;&nbsp; </process><br>&nbsp; </operator><br></process>

Thanks
-Prentice

Jeff_Mergler · April 2019

Hi Prentice,

Many ways you could do this. Depending on the situation you might use linear sampling to keep them in order in the first place, or use multiply to keep a copy of the original. Another approach could be to add something to sort by before or after the union. Sorting by ID doesn't work because it's not numeric and doesn't have leading zeros. So you could do a text transformation either to add in leading zeros or remove the prefix and change it to a numeric. Here's one example

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="9.2.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.66"/>
          <parameter key="ratio" value="0.34"/>
        </enumeration>
        <parameter key="sampling_type" value="automatic"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="union" compatibility="9.2.001" expanded="true" height="82" name="Union" width="90" x="313" y="34"/>
      <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
        <list key="function_descriptions">
          <parameter key="ordering" value="parse(replaceAll(id, &quot;id_&quot;, &quot;&quot;))"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <operator activated="true" class="sort" compatibility="9.2.001" expanded="true" height="82" name="Sort" width="90" x="581" y="34">
        <parameter key="attribute_name" value="ordering"/>
        <parameter key="sorting_direction" value="increasing"/>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Union" to_port="example set 1"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Union" to_port="example set 2"/>
      <connect from_op="Union" from_port="union" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Sort" to_port="example set input"/>
      <connect from_op="Sort" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Jeff_Mergler · April 2019

Hi Prentice,

Many ways you could do this. Depending on the situation you might use linear sampling to keep them in order in the first place, or use multiply to keep a copy of the original. Another approach could be to add something to sort by before or after the union. Sorting by ID doesn't work because it's not numeric and doesn't have leading zeros. So you could do a text transformation either to add in leading zeros or remove the prefix and change it to a numeric. Here's one example

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="9.2.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.66"/>
          <parameter key="ratio" value="0.34"/>
        </enumeration>
        <parameter key="sampling_type" value="automatic"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="union" compatibility="9.2.001" expanded="true" height="82" name="Union" width="90" x="313" y="34"/>
      <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
        <list key="function_descriptions">
          <parameter key="ordering" value="parse(replaceAll(id, &quot;id_&quot;, &quot;&quot;))"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <operator activated="true" class="sort" compatibility="9.2.001" expanded="true" height="82" name="Sort" width="90" x="581" y="34">
        <parameter key="attribute_name" value="ordering"/>
        <parameter key="sorting_direction" value="increasing"/>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Union" to_port="example set 1"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Union" to_port="example set 2"/>
      <connect from_op="Union" from_port="union" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Sort" to_port="example set input"/>
      <connect from_op="Sort" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Prentice · April 2019

@jmergler,

Yes this works, thanks!

Do you perhaps also know how to change the order of attributes. It looks like the special attributes are always first, but is it also possible to put a regular attribute in front?

Jeff_Mergler · April 2019

Rather than re-order across special and regular attributes, I would tend to think that you first need to question which attributes should special and which should be regular. Remember that with the Set Role operator you can set special attributes to regular or make up your own special roles if needed.

Telcontar120 · April 2019

You cannot put regular attributes in front of special attributes, but you can make a version of your dataset in which there are no special roles assigned, in which case you can put them in whatever order you want using Reorder Attributes.

How can I after I've split my dataset join them together again in the same order as befo

Best Answer

Answers

Categories