Using the Correlation Matrix Operator Matrix Output

BrianT
BrianT New Altair Community Member
edited November 5 in Community Q&A
In my project, I'm trying to reduce the amount of correlation in my dataset. The standard way we do this is to look at all the pair-wise correlations of attribute, isolate those pairs above .95 (in absolute), and remove the attribute from the pair that has the lower correlation with the independent variable.

The Correlation Matrix operator provides this pair-wise table that I could use. However, I haven't been able to figure out how to wire the green output node into another operator that I could use. I'd appreciate any help in accessing that data in a process and not just in the results.

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi,

    if you just want to remove the correlated attributes, the Remove Correlated Attributes operator is ready for you.

    If you want to do it manually, check out the "Matrix to ExampleSet" operator in the Converters extension. It converts your matrix to a table according to the options you set. Then you could for example filter the list, use Data to Weights and Select by Weights to eliminate the unwanted attributes.

    Regards,
    Balázs

Answers

  • Marco_Barradas
    Marco_Barradas
    Altair Employee
    Hi @BrianT

    You could use the remove correlated attributes operator.

    The image I show is from the example you can get from the help menu.


  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi,

    if you just want to remove the correlated attributes, the Remove Correlated Attributes operator is ready for you.

    If you want to do it manually, check out the "Matrix to ExampleSet" operator in the Converters extension. It converts your matrix to a table according to the options you set. Then you could for example filter the list, use Data to Weights and Select by Weights to eliminate the unwanted attributes.

    Regards,
    Balázs
  • BrianT
    BrianT New Altair Community Member
    Hi @MarcoBarradas, @BalazsBarany,

    The reason I haven't wanted to use the Remove Correlated Attributes operator is that I haven't been able to figure out how it chooses which of the two correlated attributes to remove. In the tutorial, it seems to drop the one with the higher correlation weight. I want to do the opposite but there isn't an option that allows for that. Hence, the need for a workaround.

    Unfortunately I can't access the Converters extension. It's not your problem but I am baffled by the fact that it's not possible to extract information from the correlation matrix without a third party extension.
  • BalazsBarany
    BalazsBarany New Altair Community Member
    Hi @BrianT,

    it's not a third party extension, it is developed by RapidMiner people. It's a bit of a testbed for testing which operators should go into core. 

    What's your problem with accessing the extension? If your Studio fails to access the Marketplace, you can download from the web and put it into you RapidMiner Studio installation folder in lib/plugins.
    https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_converters

    Two attributes have the same correlation, what's your problem with the order of the removal? You can try different settings of the "attribute order" parameter of Remove Correlated Attributes and check if they do what you need.

    Regards,
    Balázs
  • BrianT
    BrianT New Altair Community Member
    Hi @BalazsBarany,

    I'm circling back to answer your question since my problem has been resolved. This is more for the case where someone else has the same issue I had, specific though it was. What I really wanted to do was look at two correlated attributes and remove the one that had a lower correlation with the target variable. I did eventually figure out what the "attribute order" parameter does, like you mentioned. So to get the result I wanted, I created a subprocess that sorted my attributes in order of their correlation with my target variable.

    Thanks,
    Brian