Decision tree, random forest and classification of data set

g_pawar
g_pawar New Altair Community Member
edited November 2024 in Community Q&A

Hi All,

I am new to the rapid miner. Could some one please help me to create a decision tree and random forest (got 1 target attribute and 12 parameters influencing it). Also I need to classify the data (with regression) based on the output. The main objective is to check whether a single parameter or a combination of 2 or 4 or 5 parameters significantly  or moderately influences the the main target attribute ?  The data is attached for your reference. I tried working on selecting attributes, set roles but got some errors like missing labels and parameter missing.

Thanks, 

Gopal 

GP.csv 47.4K

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi Gopal,

     

    It seems there is a problem with your XML code : It cannot be loaded. Can you verify it.

    Meanwhile, you can find an example of process including a decision tree model with your data : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Decision_tree_basic\GP.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="1.true.real.attribute"/>
    <parameter key="1" value="2.true.real.attribute"/>
    <parameter key="2" value="3.true.real.attribute"/>
    <parameter key="3" value="4.true.integer.attribute"/>
    <parameter key="4" value="5.true.integer.attribute"/>
    <parameter key="5" value="6.true.integer.attribute"/>
    <parameter key="6" value="7.true.integer.attribute"/>
    <parameter key="7" value="8.true.integer.attribute"/>
    <parameter key="8" value="9.true.real.attribute"/>
    <parameter key="9" value="10.true.real.attribute"/>
    <parameter key="10" value="11.true.real.attribute"/>
    <parameter key="11" value="12.true.real.attribute"/>
    <parameter key="12" value="Main attribute.true.real.attribute"/>
    <parameter key="13" value="13.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="34">
    <parameter key="attribute_name" value="Main attribute"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.000" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.000" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
    <parameter key="criterion" value="least_square"/>
    </operator>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="8.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="correlation" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @g_pawar please post your XML code too using the </> button. See the Read Before Posting instructions to your right.

  • g_pawar
    g_pawar New Altair Community Member
    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="380" y="391">
    <parameter key="column_separators" value=";"/>
    <parameter key="trim_lines" value="false"/>
    <parameter key="use_quotes" value="true"/>
    <parameter key="quotes_character" value="&quot;"/>
    <parameter key="escape_character" value="\"/>
    <parameter key="skip_comments" value="false"/>
    <parameter key="comment_characters" value="#"/>
    <parameter key="parse_numbers" value="true"/>
    <parameter key="decimal_character" value="."/>
    <parameter key="grouped_digits" value="false"/>
    <parameter key="grouping_character" value=","/>
    <parameter key="date_format" value=""/>
    <parameter key="first_row_as_names" value="true"/>
    <list key="annotations"/>
    <parameter key="time_zone" value="SYSTEM"/>
    <parameter key="locale" value="English (United States)"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="read_all_values_as_polynominal" value="false"/>
    <list key="data_set_meta_data_information"/>
    <parameter key="read_not_matching_values_as_missings" value="true"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    </operator>
    </process>

     Hi Thomas,

    Thanks for the reply. Please find the code.

    Cheers

    Gopal

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi Gopal,

     

    It seems there is a problem with your XML code : It cannot be loaded. Can you verify it.

    Meanwhile, you can find an example of process including a decision tree model with your data : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Decision_tree_basic\GP.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="1.true.real.attribute"/>
    <parameter key="1" value="2.true.real.attribute"/>
    <parameter key="2" value="3.true.real.attribute"/>
    <parameter key="3" value="4.true.integer.attribute"/>
    <parameter key="4" value="5.true.integer.attribute"/>
    <parameter key="5" value="6.true.integer.attribute"/>
    <parameter key="6" value="7.true.integer.attribute"/>
    <parameter key="7" value="8.true.integer.attribute"/>
    <parameter key="8" value="9.true.real.attribute"/>
    <parameter key="9" value="10.true.real.attribute"/>
    <parameter key="10" value="11.true.real.attribute"/>
    <parameter key="11" value="12.true.real.attribute"/>
    <parameter key="12" value="Main attribute.true.real.attribute"/>
    <parameter key="13" value="13.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="34">
    <parameter key="attribute_name" value="Main attribute"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.1.000" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.000" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
    <parameter key="criterion" value="least_square"/>
    </operator>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="8.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="correlation" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

  • g_pawar
    g_pawar New Altair Community Member

    Thanks Lionel. Now its working.

    Regards,

    Gopal

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.