Correlation - need help getting started
RCL1
New Altair Community Member
Folks- I've just started using Rapid Miner and am trying to calculate a correlation coefficient as a first test.
My data set includes three columns, the first is the key, so the role is set to "id". The second column would be my dependent variable, so I set role to "label". Finally, the third column I set to "regular".
In the Designer, I piped a "retrieve" of the data to a "Correlation Matrix" operator.
When I run the process, the results perspective does not show correlation in the Meta Data View. Under the Statistics column, it only shows avg = 1235 +/- 123, as well as a Range column.
Can anyone tell me how to get it to calculate and display a Pearson correlation coefficient?
Thanks,
RCL1
My data set includes three columns, the first is the key, so the role is set to "id". The second column would be my dependent variable, so I set role to "label". Finally, the third column I set to "regular".
In the Designer, I piped a "retrieve" of the data to a "Correlation Matrix" operator.
When I run the process, the results perspective does not show correlation in the Meta Data View. Under the Statistics column, it only shows avg = 1235 +/- 123, as well as a Range column.
Can anyone tell me how to get it to calculate and display a Pearson correlation coefficient?
Thanks,
RCL1
Tagged:
0
Answers
-
Still trying to square what I'm reading in the manual with my test example. Re-imported the data:
Four columns: IP ID (Integer, ID); Income (integer, weight); Expenditure (integer, attribute); WC QT (integer, attribute)
Now when I pipe this data retrieve to the correlation matrix, I only get average and range for statistics under Meta Data View.
Where do I calculate a correlation coefficient?
Thanks,
RL0 -
Hi, did you connect the mat output of Correlation Matrix to the process output? Please have a look at the process below. If that does not solve your problem, please attach your process setup as described in this thread: http://rapid-i.com/rapidforum/index.php/topic,4782.0.html
Best,
Marius<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.002" expanded="true" name="Process">
<process expanded="true" height="638" width="716">
<operator activated="true" class="generate_data" compatibility="5.2.002" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
<parameter key="target_function" value="random classification"/>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="5.2.002" expanded="true" height="94" name="Correlation Matrix" width="90" x="313" y="30"/>
<connect from_op="Generate Data" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Ok, I took a look. Other than some style differences I'm not seeing any glaring problems.
This is my process:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<context>
<input>
<location>//RapdMinerNewLocalRepository/CorrTest</location>
</input>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.000"
expanded="true" name="Process">
<process expanded="true" height="435" width="413">
<operator activated="true" class="retrieve"
compatibility="5.2.000" expanded="true" height="60" name="Retrieve"
width="90" x="87" y="237">
<parameter key="repository_entry" value="corrTest2"/>
</operator>
<operator activated="true" class="correlation_matrix"
compatibility="5.2.000" expanded="true" height="94" name="Correlation
Matrix" width="90" x="338" y="221">
<parameter key="squared_correlation" value="true"/>
</operator>
<connect from_op="Retrieve" from_port="output"
to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="example set"
to_port="result 1"/>
<connect from_op="Correlation Matrix" from_port="matrix"
to_port="result 2"/>
<portSpacing port="source_input 1" spacing="54"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
I changed port spacing from 54 to 0 on result 2.
<portSpacing port="source_input 1" spacing="0"/>
I'm getting a pairwise table now...0 -
The port spacing should not influence the results at all, it's just for the visualization of the process.
Is that what you want?RCL1 wrote:
I'm getting a pairwise table now...0