Is there a way to process a data that is in ratio?
I have a university ranking dataset and one of the columns is gender ratio. Is there a way to analyze it to answer my research question " Does gender distribution affect the ranking of university?"
Answers
-
Hi @apiphuh
First, I used the write Excel operator to convert your .csv file into excel file. Then I performed in excel a preprocessing step with a macro on your female_male_ratio attribute to create a new attribute female_male_ratio_2 which is numerical (33:67 => 0,49 for example).
The new excel file is in attached zip file.
1.After visual analysis, it seems that there are no obvious relationship between "world rank" and "female_male_ratio_2". See the following screenshot :
2. to confirm this observation, I use the "correlation matrix" : the correlation coef between "word rank" and "female_male_ratio_2" is 0,138.
this score means that there are not linear relationships between these two attributes.
You can go further by applying some algo.
Here the process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\timesData_Excel.xlsx"/>
<parameter key="imported_cell_range" value="A1:O2604"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="world_rank.true.integer.attribute"/>
<parameter key="1" value="university_name.true.polynominal.attribute"/>
<parameter key="2" value="country.true.polynominal.attribute"/>
<parameter key="3" value="teaching.true.numeric.attribute"/>
<parameter key="4" value="international.true.polynominal.attribute"/>
<parameter key="5" value="research.true.numeric.attribute"/>
<parameter key="6" value="citations.true.numeric.attribute"/>
<parameter key="7" value="income.true.polynominal.attribute"/>
<parameter key="8" value="total_score.true.numeric.attribute"/>
<parameter key="9" value="num_students.true.polynominal.attribute"/>
<parameter key="10" value="student_staff_ratio.true.numeric.attribute"/>
<parameter key="11" value="international_students.true.polynominal.attribute"/>
<parameter key="12" value="female_male_ratio.true.polynominal.attribute"/>
<parameter key="13" value="female_male_ratio_2.true.numeric.attribute"/>
<parameter key="14" value="year.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="8.0.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="34"/>
<connect from_op="Read Excel" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>I hope this first response elements will be helpful.
Regards,
Lionel
1 -
You could use an statistical test to answer the question, for example a chi squared independency test.
1