How to invert the example order?
Newbie
New Altair Community Member
Hey everyone!
I'm new to Rapid Miner and currently trying to use it on my first data set. I am desperately looking for a way to invert the order of my examples, i.e put the first row last, the second row second-to-last, and so on. The sort-operator refuses to work on the row-number (which kind of makes sense, since this isn't a real attribute). It's quit a big data set, and my current work-arounds take way to much time. Any ideas?
For context: I actually want to do this to remove certain duplicates. The remove duplicates operator seems to keep the first example and delete every duplication afterwards. I would like to keep the last example and remove all duplicates before (I'm filtering on a subset for the remove duplicates opertor). So my idea was to invert the order of examples to achieve this.
Thank you for your help!
Tagged:
0
Best Answer
-
Hello @Newbie
You can use generate ID operator that generated ID for all the examples in your dataset. Then sort based on ID column in decreasing order which will invert the examples. Sample XML code below. To run this XML code you need to open a blank process. Go to View --> Show Panel --> XML. You can copy paste this code in XML window and click the green color tick mark that will show the process in the process window. Run it so that you can see how this sample is inverted.<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="179" y="34"> <parameter key="repository_entry" value="//Samples/data/Titanic Training"/> </operator> <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="380" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort" width="90" x="581" y="34"> <parameter key="attribute_name" value="id"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
There might be other solutions as well. Hope this helps
PS: Once they are inverted, then you can use select attributes operator to remove the ID column2
Answers
-
Hello @Newbie
You can use generate ID operator that generated ID for all the examples in your dataset. Then sort based on ID column in decreasing order which will invert the examples. Sample XML code below. To run this XML code you need to open a blank process. Go to View --> Show Panel --> XML. You can copy paste this code in XML window and click the green color tick mark that will show the process in the process window. Run it so that you can see how this sample is inverted.<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="179" y="34"> <parameter key="repository_entry" value="//Samples/data/Titanic Training"/> </operator> <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="380" y="34"> <parameter key="create_nominal_ids" value="false"/> <parameter key="offset" value="0"/> </operator> <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort" width="90" x="581" y="34"> <parameter key="attribute_name" value="id"/> <parameter key="sorting_direction" value="decreasing"/> </operator> <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Generate ID" to_port="example set input"/> <connect from_op="Generate ID" from_port="example set output" to_op="Sort" to_port="example set input"/> <connect from_op="Sort" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
There might be other solutions as well. Hope this helps
PS: Once they are inverted, then you can use select attributes operator to remove the ID column2 -
Thank you very much for the suggestion, it worked perfectly!
1