Correlation Matrix
maccten
New Altair Community Member
Hi,
I have a large data set with many attributes
I would like to see how closely the attributes are correlated but because of the sheer number of them I'm only interested in attributes that are correlated about 40%
Is there a way to do this for example using a filter of some description. I know you can remove correlated attributes and select by weights but are not what i need as im interested in the high correlations
Thank you for your time
I have a large data set with many attributes
I would like to see how closely the attributes are correlated but because of the sheer number of them I'm only interested in attributes that are correlated about 40%
Is there a way to do this for example using a filter of some description. I know you can remove correlated attributes and select by weights but are not what i need as im interested in the high correlations
Thank you for your time
Tagged:
0
Answers
-
Hello
There are options like "top k" and "top p%" in the Select by Weights operator that might help.
regards
Andrew0 -
Hi Andrew
Thanks for the quick reply. I ran it this morning but i don't think this is what I'm looking for
What i need is the pairwise table so i can specifically say there is a 50% correlation between Attribute A and B but a Negative correalation between A and C
Do you know if you can filter the actual matrix?
Thanks0 -
Hi All
Is there perhaps a method to export the pairwise table into a CSV file or generate a report based off of it?
Has anyone tried it before
If it was in a database it would be simple case of selecting the rows where the correlation is above a certain amount
Thanks0 -
Hello
A groovy script would be able to do it. I could probably do that in return for beer or money ;D
Alternatively, I'm having a think about the possibility of calculating the correlation in a process without using the built in operators. That way would let you make an example set that could be filtered as you like.
regards
Andrew0 -
I thought this link provided the answer http://www.myexperiment.org/workflows/1279.html
But unfortunately, it doesn't provide a pairwise table and the matrix in question is 5000 attributes in scope so exporting it to excel means cutting off a good portion of it
Il keep the beer money in mind of course , as soon as the next pay check comes around0 -
Have a look at the configuration of the Report operator: you should be able to configure Pairwise Table as output format.
Have a look at process below:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="correlation_matrix" compatibility="5.3.008" expanded="true" height="94" name="Correlation Matrix" width="90" x="179" y="30"/>
<operator activated="true" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="76" name="Generate Report" width="90" x="313" y="30">
<parameter key="report_name" value="test"/>
<parameter key="format" value="Excel"/>
<parameter key="excel_output_file" value="C:\Users\jdoe\Desktop\test.xls"/>
</operator>
<operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="60" name="Report" width="90" x="447" y="30">
<parameter key="report_name" value="test"/>
<parameter key="specified" value="true"/>
<parameter key="reportable_type" value="Numerical Matrix"/>
<parameter key="renderer_name" value="Pairwise Table"/>
<list key="parameters">
<parameter key="min_row" value="1"/>
<parameter key="max_row" value="2147483647"/>
<parameter key="min_column" value="1"/>
<parameter key="max_column" value="2147483647"/>
</list>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_op="Generate Report" to_port="through 1"/>
<connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>
<connect from_op="Report" from_port="reportable out" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hi Marius
This works
However i have one last problem in relation to this
My pair wise table is going to generate roughly 25 million rows which is not exportable using a report
Is there anyway to filter the matrix/pairwise table so that say only attributes with a certain correlation are exported for example only return attributes with 50% or more correlation?
Thanks0 -
Unfortunately, this is not possible. To solve the problem once and forever, we have an internal ticket requesting to convert the matrix into a normal example set, but we don't have a schedule for it yet.0
-
Thanks Marius ver much for the feedback0