"Group by Correlation"
ratheesan
New Altair Community Member
Hi,
I have 4 attribute,in which 1 is nominal and other 3 are numerical.My objective is to calculate pair wise correlation coefficient between the 3 numerical attribute group by the nominal attribute.ie,if the nominal attribute contains 2 distinct values namely city1and city2,then I need the correlation coefficient between other attributes in city 1 and city2 seperately. I tried it with some operator but not getting group by correlation.This is my process.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\dummy.xls.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="GroupBy" class="GroupBy">
<parameter key="attribute_name" value="aa"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_numerical"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
Thanks,
Ratheesan
I have 4 attribute,in which 1 is nominal and other 3 are numerical.My objective is to calculate pair wise correlation coefficient between the 3 numerical attribute group by the nominal attribute.ie,if the nominal attribute contains 2 distinct values namely city1and city2,then I need the correlation coefficient between other attributes in city 1 and city2 seperately. I tried it with some operator but not getting group by correlation.This is my process.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\dummy.xls.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="GroupBy" class="GroupBy">
<parameter key="attribute_name" value="aa"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_numerical"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
Thanks,
Ratheesan
Tagged:
0
Answers
-
Hi,
did I understood you correctly, that you are going to calculate the correlation on the subset of the example set containing either city1 or city2?
Then you could use a ValueIterator in combination with a nested ExampleFilter.
By the way: Ever thought of becoming enterprise customer? You have quite a bunch of questions and I would be able to answer much more detailed during consulting. I could then simply post an example process here...
Greetings,
Sebastian0 -
Hi Sebastian,
I am using Rapid Miner Enterprise Edition only.When I am using Excelsheet as input I am getting separate correlation for each class. But when I am reading the same data from SQLServer I am getting the error message as "Cannot instantiate 'attribute_value_filter': com.rapidminer.example.set.AttributeValueFilter: cannot invoke condition (Parameter string must have the form 'attribute {=|<|>|<=|>=|!=} value')". I am attaching the process for both.
Excel Input
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\tfq.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="label_column" value="14"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="dept"/>
<parameter key="iteration_macro" value="mmm"/>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dept=%{mmm}"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="tfq_score||tenure"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
</operator>
SQL Input
<operator name="Root" class="Process" expanded="yes">
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_system" value="Microsoft SQL Server (Microsoft)"/>
<parameter key="database_url" value="jdbc:sqlserver://COMPUTER-647;databaseName=DataMart"/>
<parameter key="username" value="sa"/>
<parameter key="password" value="VNfe8QITNRw19hgf6f6UpA=="/>
<parameter key="query" value="select * from F_TFQ"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="DEPT"/>
<parameter key="iteration_macro" value="mmm"/>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="DEPT=%{mmm}"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="TFQ_SCORE||TENURE"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
</operator>
</operator>
Thanks
Ratheesan0 -
Hi,
sorry, but usually enterprise customer use their account on our online support ticket system for asking questions...
This is strange but I cannot reproduce this, because I don't have your database. Did you check if the DEPT attribute is of the desired type?
Greetings,
Sebastian0