"[Solved] How to work on Correlation Matrix results"

TomTom
TomTom New Altair Community Member
edited November 5 in Community Q&A
Hello,

I have made a correlation matrix within my process and would like to use the Matrix results in this process. Unfortunately the only operators I've found which can use "mat" as input are the ones in Reporting addon. So I can write an Excel File, for example. I couldn't find a way to use standard operators like "Filter examples" or  "Select" with the Matrix as input. Is there a way to filter the matrix for certain values without writing it to a file and reading the file within the process (This wouldn't be fast enough, because I have very very large data sets)?

I've tried also another way: I've used "Remove correlated attributes" instead "Correlation Matrix", and set filter relation to "less", to get attribute pairs which are correlating each other, but the results are confusing me:

Sometimes the result of "Remove correlated attributes" is a resultset with just one column. If I have a Result Set with two attributes: A and B and also some other attributes and column A has a high correlation to another column. Why is "Remove correlated attributes" just returning one of both columns? I would expect it to return both columns, because Correlation is a bidirectional relationship.

It would be really great, if anyone could help on this issue.

Answers

  • TomTom
    TomTom New Altair Community Member
    As I haven't found a clean solution, I have chosen to write Correlation Matrix to file and reload it again. Very dirty, but it works. Please let me know, if there's a cleaner solution.

    Here is an example process of what I have done:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="656" width="681">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="20"/>
          </operator>
          <operator activated="true" class="correlation_matrix" compatibility="5.3.000" expanded="true" height="94" name="Correlation Matrix" width="90" x="179" y="30"/>
          <operator activated="true" class="write_as_text" compatibility="5.3.000" expanded="true" height="76" name="Write as Text" width="90" x="313" y="30">
            <parameter key="result_file" value="C:\TEMP\test.csv"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.3.000" expanded="true" height="60" name="Read CSV" width="90" x="179" y="165">
            <parameter key="csv_file" value="C:\TEMP\test.csv"/>
            <parameter key="column_separators" value="\s"/>
            <parameter key="comment_characters" value=""/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.polynominal.attribute"/>
              <parameter key="1" value="att2.true.polynominal.attribute"/>
              <parameter key="2" value="att3.true.polynominal.attribute"/>
              <parameter key="3" value="att4.true.polynominal.attribute"/>
              <parameter key="4" value="att5.true.polynominal.attribute"/>
              <parameter key="5" value="att6.true.polynominal.attribute"/>
              <parameter key="6" value="att7.true.polynominal.attribute"/>
              <parameter key="7" value="att8.true.polynominal.attribute"/>
              <parameter key="8" value="att9.true.polynominal.attribute"/>
              <parameter key="9" value="att10.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.3.000" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="165">
            <parameter key="first_example" value="3"/>
            <parameter key="last_example" value="12"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
          <connect from_op="Correlation Matrix" from_port="matrix" to_op="Write as Text" to_port="input 1"/>
          <connect from_op="Write as Text" from_port="input 1" to_port="result 1"/>
          <connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi,

    there is currently no good way to automatically process the output of the Correlation Matrix operator. We already have an internal feature request to be able to convert the matrix object to an example set.

    Concerning the Remove Correlated Attributes: if you have a set of correlated attributes, this operator *should* remove all but one of them (not all of them, that way the complete information would be lost).  Does that explain your observations, or did I misunderstand something of your description?

    Best regards,
    Marius
  • TomTom
    TomTom New Altair Community Member
    Hi Marius,

    Thanks for your response. Yes, that explains my observations. As I need some other columns from the matrix I have chosen to write the matrix results to Hard Drive and reload it again as CSV. That works quite well for me, now.

    Best regards to Dortmund
  • qwertz
    qwertz New Altair Community Member

    Please be aware that the "write as text" operator will only write the first 20 attributes!! This is weird as I could not find any hints in the documentation about this. However, you can use a similar work-around with the report operator which is also explained in the forum. (see http://rapid-i.com/rapidforum/index.php?topic=2081.0)

    Piece of code which shows that not all attributes are written:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="6.0.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="30"/>
          </operator>
          <operator activated="true" class="correlation_matrix" compatibility="6.0.003" expanded="true" height="94" name="Correlation Matrix" width="90" x="179" y="30"/>
          <operator activated="true" class="write_as_text" compatibility="6.0.003" expanded="true" height="76" name="Write as Text" width="90" x="313" y="30">
            <parameter key="result_file" value="C:\test.txt"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
          <connect from_op="Correlation Matrix" from_port="matrix" to_op="Write as Text" to_port="input 1"/>
          <connect from_op="Write as Text" from_port="input 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Cheers
    Sachs