Is there a way to process a data that is in ratio?

apiphuh
apiphuh New Altair Community Member
edited November 5 in Community Q&A

I have a university ranking dataset and one of the columns is gender ratio. Is there a way to analyze it to answer my research question " Does gender distribution affect the ranking of university?"

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    Hi @apiphuh

     

    First, I used the write Excel operator to convert your .csv file into excel file. Then I performed in excel a preprocessing step with a macro on your female_male_ratio attribute to create a new attribute female_male_ratio_2 which is numerical (33:67 => 0,49 for example).

    The new excel file is in attached zip file.

     

    1.After visual analysis, it seems that there are no obvious relationship between "world rank" and "female_male_ratio_2". See the following screenshot :

    rank_ratio-male-female.png

     

    2. to confirm this observation, I use the "correlation matrix" : the correlation coef between "word rank" and "female_male_ratio_2" is 0,138.

    this score means that there are not linear relationships between these two attributes.

    You can go further by applying some algo.

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\timesData_Excel.xlsx"/>
    <parameter key="imported_cell_range" value="A1:O2604"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="world_rank.true.integer.attribute"/>
    <parameter key="1" value="university_name.true.polynominal.attribute"/>
    <parameter key="2" value="country.true.polynominal.attribute"/>
    <parameter key="3" value="teaching.true.numeric.attribute"/>
    <parameter key="4" value="international.true.polynominal.attribute"/>
    <parameter key="5" value="research.true.numeric.attribute"/>
    <parameter key="6" value="citations.true.numeric.attribute"/>
    <parameter key="7" value="income.true.polynominal.attribute"/>
    <parameter key="8" value="total_score.true.numeric.attribute"/>
    <parameter key="9" value="num_students.true.polynominal.attribute"/>
    <parameter key="10" value="student_staff_ratio.true.numeric.attribute"/>
    <parameter key="11" value="international_students.true.polynominal.attribute"/>
    <parameter key="12" value="female_male_ratio.true.polynominal.attribute"/>
    <parameter key="13" value="female_male_ratio_2.true.numeric.attribute"/>
    <parameter key="14" value="year.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="correlation_matrix" compatibility="8.0.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="246" y="34"/>
    <connect from_op="Read Excel" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
    <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
    <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope this first response elements will be helpful.

     

    Regards,

     

    Lionel

     

     

  • SGolbert
    SGolbert New Altair Community Member

    You could use an statistical test to answer the question, for example a chi squared independency test.