Using the Correlation Matrix Operator Matrix Output
BrianT
New Altair Community Member
In my project, I'm trying to reduce the amount of correlation in my dataset. The standard way we do this is to look at all the pair-wise correlations of attribute, isolate those pairs above .95 (in absolute), and remove the attribute from the pair that has the lower correlation with the independent variable.
The Correlation Matrix operator provides this pair-wise table that I could use. However, I haven't been able to figure out how to wire the green output node into another operator that I could use. I'd appreciate any help in accessing that data in a process and not just in the results.
The Correlation Matrix operator provides this pair-wise table that I could use. However, I haven't been able to figure out how to wire the green output node into another operator that I could use. I'd appreciate any help in accessing that data in a process and not just in the results.
Tagged:
0
Best Answer
-
Hi,
if you just want to remove the correlated attributes, the Remove Correlated Attributes operator is ready for you.
If you want to do it manually, check out the "Matrix to ExampleSet" operator in the Converters extension. It converts your matrix to a table according to the options you set. Then you could for example filter the list, use Data to Weights and Select by Weights to eliminate the unwanted attributes.
Regards,
Balázs0
Answers
-
Hi @BrianT
You could use the remove correlated attributes operator.
The image I show is from the example you can get from the help menu.
0 -
Hi,
if you just want to remove the correlated attributes, the Remove Correlated Attributes operator is ready for you.
If you want to do it manually, check out the "Matrix to ExampleSet" operator in the Converters extension. It converts your matrix to a table according to the options you set. Then you could for example filter the list, use Data to Weights and Select by Weights to eliminate the unwanted attributes.
Regards,
Balázs0 -
Hi @MarcoBarradas, @BalazsBarany,
The reason I haven't wanted to use the Remove Correlated Attributes operator is that I haven't been able to figure out how it chooses which of the two correlated attributes to remove. In the tutorial, it seems to drop the one with the higher correlation weight. I want to do the opposite but there isn't an option that allows for that. Hence, the need for a workaround.
Unfortunately I can't access the Converters extension. It's not your problem but I am baffled by the fact that it's not possible to extract information from the correlation matrix without a third party extension.0 -
Hi @BrianT,
it's not a third party extension, it is developed by RapidMiner people. It's a bit of a testbed for testing which operators should go into core.
What's your problem with accessing the extension? If your Studio fails to access the Marketplace, you can download from the web and put it into you RapidMiner Studio installation folder in lib/plugins.
https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_converters
Two attributes have the same correlation, what's your problem with the order of the removal? You can try different settings of the "attribute order" parameter of Remove Correlated Attributes and check if they do what you need.
Regards,
Balázs0 -
Hi @BalazsBarany,
I'm circling back to answer your question since my problem has been resolved. This is more for the case where someone else has the same issue I had, specific though it was. What I really wanted to do was look at two correlated attributes and remove the one that had a lower correlation with the target variable. I did eventually figure out what the "attribute order" parameter does, like you mentioned. So to get the result I wanted, I created a subprocess that sorted my attributes in order of their correlation with my target variable.
Thanks,
Brian2