"normalizing error (works backwards in workflow??)"

michaelhecht
michaelhecht New Altair Community Member
edited November 5 in Community Q&A
Hello,

I have a workflow which starts with a

1. Excel file reader
2. then selects attributes
3. then send the original data to a CSV writer
4. and the selected attributes to a normalizer for further processing.

If I now run the workflow and have a look to the written csv file, all "real" columns are normalized despite the normalizer is applied after sending the data to the CSV writer. This is really strange.

So - what can I do to store the original data?

Tagged:

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    the normalization "branch" of your process is done before your write csv operator starts working. I suggest the following quick fix:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.014">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
        <process expanded="true" height="235" width="681">
          <operator activated="true" class="read_excel" compatibility="5.1.014" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="30">
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="179" y="30">
            <parameter key="csv_file" value="C:\Users\boeck\Desktop\Test.csv"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.014" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Test"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.1.014" expanded="true" height="94" name="Normalize" width="90" x="447" y="30"/>
          <connect from_op="Read Excel (2)" from_port="output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    That way, your csv gets created before anything else, and then your data is modified.

    Regards,
    Marco
  • michaelhecht
    michaelhecht New Altair Community Member
    Thank You, this might work, but isn't a solution to my original problem.
    Nevertheless, I'm glad (not really) that it is a bug ant not my own incompetence  ;)

    I've got data with different (more than one) id-columns that I want to pass through the workflow.
    If I don't care, RapidMiner selects one column to be the only id. The selected column unfortunately
    isn't unique. Therefore I removed all non-necessary (non-unique) id columns prior to the actual workflow
    but want to add these again at the end of the workflow, before I write all to the csv-file. To be able to
    understand the result of the workflow I also wanted to write the non-normalized columns - which didn't
    work. That's why I need to write the csv at the end of the workflow.

    Meanwhile I found that I can join all data with the ori-output of the normalizer. This
    seems to be a better workaround.

    By the way: I wonder why there is no de-normalizer node to improve the readability of
    the output.
  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    Just a hint, you can switch the ID role to the real ID column by using the Exchange role operator. That way, you don't need to remove any columns for your process.

    Regards,
    Marco