How do I apply the "decision tree model" to the "friends & family" data set
I can't find the input to connect "friends & family" to. Here's where I'm at:
P.S. I'd really appreciate it if you can help me solve #8-12. All files attached.
Process:
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Titanic\data.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="pclass.true.integer.attribute"/>
<parameter key="1" value="survived.true.binominal.attribute"/>
<parameter key="2" value="name.true.polynominal.attribute"/>
<parameter key="3" value="sex.true.polynominal.attribute"/>
<parameter key="4" value="age.true.real.attribute"/>
<parameter key="5" value="ticket.true.polynominal.attribute"/>
<parameter key="6" value="fare.true.real.attribute"/>
<parameter key="7" value="embarked.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="name|ticket"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="survived"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.38"/>
<parameter key="ratio" value="0.62"/>
</enumeration>
<parameter key="sampling_type" value="shuffled sampling"/>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="715" y="187">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="849" y="187">
<list key="class_weights"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<connect from_op="Performance" from_port="performance" to_port="result 2"/>
<connect from_op="Performance" from_port="example set" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
Hi @sky,
To apply the model Decision Tree on your "family data", you need an Apply Model operator.
To feed the mod (model) input of this operator, you can multiply the mod output of the Decision Tree operator using a Multiply operator :
The process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Titanic\data.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="pclass.true.integer.attribute"/>
<parameter key="1" value="survived.true.binominal.attribute"/>
<parameter key="2" value="name.true.polynominal.attribute"/>
<parameter key="3" value="sex.true.polynominal.attribute"/>
<parameter key="4" value="age.true.real.attribute"/>
<parameter key="5" value="ticket.true.polynominal.attribute"/>
<parameter key="6" value="fare.true.real.attribute"/>
<parameter key="7" value="embarked.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="name|ticket|embarked"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="survived"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.38"/>
<parameter key="ratio" value="0.62"/>
</enumeration>
<parameter key="sampling_type" value="shuffled sampling"/>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="648" y="34"/>
<operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="103" name="Multiply" width="90" x="782" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="715" y="187">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="849" y="187">
<list key="class_weights"/>
</operator>
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="112" y="289">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Titanic\friends & family.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="pclass.true.integer.attribute"/>
<parameter key="1" value="name.true.polynominal.attribute"/>
<parameter key="2" value="sex.true.polynominal.attribute"/>
<parameter key="3" value="age.true.integer.attribute"/>
<parameter key="4" value="fare.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="289">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="name|ticket"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="514" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Decision Tree" from_port="model" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="model"/>
<connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<connect from_op="Performance" from_port="performance" to_port="result 2"/>
<connect from_op="Performance" from_port="example set" to_port="result 3"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>For the last questions, you have to play with the parameters of the Decision Tree operator and see how it affects the results :
Regards,
Lionel
2
Answers
-
Hi @sky,
To apply the model Decision Tree on your "family data", you need an Apply Model operator.
To feed the mod (model) input of this operator, you can multiply the mod output of the Decision Tree operator using a Multiply operator :
The process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Titanic\data.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="pclass.true.integer.attribute"/>
<parameter key="1" value="survived.true.binominal.attribute"/>
<parameter key="2" value="name.true.polynominal.attribute"/>
<parameter key="3" value="sex.true.polynominal.attribute"/>
<parameter key="4" value="age.true.real.attribute"/>
<parameter key="5" value="ticket.true.polynominal.attribute"/>
<parameter key="6" value="fare.true.real.attribute"/>
<parameter key="7" value="embarked.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="name|ticket|embarked"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="survived"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.38"/>
<parameter key="ratio" value="0.62"/>
</enumeration>
<parameter key="sampling_type" value="shuffled sampling"/>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="648" y="34"/>
<operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="103" name="Multiply" width="90" x="782" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="715" y="187">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="849" y="187">
<list key="class_weights"/>
</operator>
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="112" y="289">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Titanic\friends & family.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="pclass.true.integer.attribute"/>
<parameter key="1" value="name.true.polynominal.attribute"/>
<parameter key="2" value="sex.true.polynominal.attribute"/>
<parameter key="3" value="age.true.integer.attribute"/>
<parameter key="4" value="fare.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="289">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="name|ticket"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="514" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Decision Tree" from_port="model" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="model"/>
<connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<connect from_op="Performance" from_port="performance" to_port="result 2"/>
<connect from_op="Performance" from_port="example set" to_port="result 3"/>
<connect from_op="Read CSV (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 4"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>For the last questions, you have to play with the parameters of the Decision Tree operator and see how it affects the results :
Regards,
Lionel
2 -
That is great, thanks. Sorry if this sounds stupid. Remember we split the data in the beginning? I can find 1 partition only in the results (812 examples). How do I find the 2nd partition (497 examples)?
0 -
@sky,
That's because I forget to connect the exa (example set) output port of the Decision Tree operator to the res (result) port :
Regards,
Lionel
1