"[SOLVED] How to swap attributes/column order for Excel output?"

scepxko
scepxko New Altair Community Member
edited November 5 in Community Q&A
hello,

after a few intensive days deep into Rapidminer i managed to solve almost any complicated problem i had (xPath, crawl, filter, date format, regex, etc. thanks to this forum) but this simple one!

I am reading Excel files where the order of columns is : teaser / URL / title / time
I simply want to change this order into: time / title / teaser / URL

In the "Read Excel" operator, i tried to modify the "column index" values for the different attributes (ie. 0 for time, 3 for URL). No effect

I tried to change "attribute" to "label", "id", "weight"...No effect too.

I surely missed something somewhere...
Any help is appreciated!
Alex
Tagged:

Best Answer

  • scepxko
    scepxko New Altair Community Member
    Answer ✓
    Hello Dietrich,

    Thank you for your help. It worked perfectly.
    Below is the process that works for me.
    Thank you again!
    Alex
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.001" expanded="true" name="Process">
        <process expanded="true" height="556" width="815">
          <operator activated="true" class="read_excel" compatibility="5.2.001" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="165">
            <parameter key="excel_file" value="E:\Rapidminer\links_summary.xls"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.2.001" expanded="true" height="76" name="Work on Subset (3)" width="90" x="179" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="teaser"/>
            <process expanded="true">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.2.001" expanded="true" height="76" name="Work on Subset" width="90" x="313" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="title"/>
            <process expanded="true" height="738" width="914">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.2.001" expanded="true" height="76" name="Work on Subset (2)" width="90" x="447" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="time"/>
            <process expanded="true" height="738" width="914">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel (2)" from_port="output" to_op="Work on Subset (3)" to_port="example set"/>
          <connect from_op="Work on Subset (3)" from_port="example set" to_op="Work on Subset" to_port="example set"/>
          <connect from_op="Work on Subset" from_port="example set" to_op="Work on Subset (2)" to_port="example set"/>
          <connect from_op="Work on Subset (2)" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

Answers

  • niederlintner
    niederlintner New Altair Community Member
    Hello Alex,

    unfortunately at this time there is no special operator to put the columns/attributes in a specific order. There already is a feature request for this (#985 in Bugzilla). Let's see how that goes eventually.

    As a workaround you could you use two "Work on Subset" operators to get this done. You can use it to pick a single attribute which is then made the leftmost column. In your case you would first pick "title" and then "time".

    1. Drop the operator "Work on Subset" on your process
    2. Double-click to open.
    3. Connect the two example ports and move up to exit the operator
    4. Set parameters: attribute filter type = single, attribute = title
    5. Drop another "Work on Subset" behind the first one, attribute = time

    Let me know if that gets it done.

    Dietrich
  • scepxko
    scepxko New Altair Community Member
    Answer ✓
    Hello Dietrich,

    Thank you for your help. It worked perfectly.
    Below is the process that works for me.
    Thank you again!
    Alex
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.001" expanded="true" name="Process">
        <process expanded="true" height="556" width="815">
          <operator activated="true" class="read_excel" compatibility="5.2.001" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="165">
            <parameter key="excel_file" value="E:\Rapidminer\links_summary.xls"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.2.001" expanded="true" height="76" name="Work on Subset (3)" width="90" x="179" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="teaser"/>
            <process expanded="true">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.2.001" expanded="true" height="76" name="Work on Subset" width="90" x="313" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="title"/>
            <process expanded="true" height="738" width="914">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="work_on_subset" compatibility="5.2.001" expanded="true" height="76" name="Work on Subset (2)" width="90" x="447" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="time"/>
            <process expanded="true" height="738" width="914">
              <connect from_port="exampleSet" to_port="example set"/>
              <portSpacing port="source_exampleSet" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel (2)" from_port="output" to_op="Work on Subset (3)" to_port="example set"/>
          <connect from_op="Work on Subset (3)" from_port="example set" to_op="Work on Subset" to_port="example set"/>
          <connect from_op="Work on Subset" from_port="example set" to_op="Work on Subset (2)" to_port="example set"/>
          <connect from_op="Work on Subset (2)" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>