Random Forest - Attribute Importance
asav_yu
New Altair Community Member
I have built a Random Forest model that shows very good accuracy after many test runs so I think I found a winner for my simple problem. I used "Weight by Tree Importance" operator to see which attributes are most important. Customer Income turned out to be most important.
But how do I know if higher or lower income supports my prediction? With a simple decision tree I can just look at the split and see but how do I do that in a Random Forest?
Apologies for noob question.
Thank you in advance!
But how do I know if higher or lower income supports my prediction? With a simple decision tree I can just look at the split and see but how do I do that in a Random Forest?
Apologies for noob question.
Thank you in advance!
Tagged:
0
Best Answers
-
Hi @asav_yu, have you run the model simulator in Auto Model? If you try "simulator" operator on your random forest trees, it will show you how the input "customer income" would affect the prediction interactively..
https://rapidminer.com/products/auto-model/<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34"> <parameter key="repository_entry" value="//Samples/data/Iris"/> </operator> <operator activated="true" class="split_data" compatibility="9.1.000" expanded="true" height="103" name="Split Data" width="90" x="246" y="238"> <enumeration key="partitions"> <parameter key="ratio" value="0.6"/> <parameter key="ratio" value="0.4"/> </enumeration> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.1.000" expanded="true" height="103" name="Random Forest" width="90" x="380" y="34"> <parameter key="number_of_trees" value="100"/> <parameter key="criterion" value="gain_ratio"/> <parameter key="maximal_depth" value="10"/> <parameter key="apply_pruning" value="false"/> <parameter key="confidence" value="0.1"/> <parameter key="apply_prepruning" value="false"/> <parameter key="minimal_gain" value="0.01"/> <parameter key="minimal_leaf_size" value="2"/> <parameter key="minimal_size_for_split" value="4"/> <parameter key="number_of_prepruning_alternatives" value="3"/> <parameter key="random_splits" value="false"/> <parameter key="guess_subset_ratio" value="true"/> <parameter key="subset_ratio" value="0.2"/> <parameter key="voting_strategy" value="confidence vote"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="238"/> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="715" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="model_simulator:model_simulator" compatibility="9.1.000" expanded="true" height="103" name="Model Simulator" width="90" x="782" y="238"/> <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/> <connect from_op="Split Data" from_port="partition 1" to_op="Random Forest" to_port="training set"/> <connect from_op="Split Data" from_port="partition 2" to_op="Multiply" to_port="input"/> <connect from_op="Random Forest" from_port="model" to_op="Apply Model" to_port="model"/> <connect from_op="Random Forest" from_port="exampleSet" to_op="Model Simulator" to_port="training data"/> <connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Multiply" from_port="output 2" to_op="Model Simulator" to_port="test data"/> <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/> <connect from_op="Apply Model" from_port="model" to_op="Model Simulator" to_port="model"/> <connect from_op="Model Simulator" from_port="simulator output" to_port="result 2"/> <connect from_op="Model Simulator" from_port="model output" to_port="result 3"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> </process> </operator> </process>
6 -
You can also use the Explain Predictions operator to do the same thing even if you don't have access to Automodel.
5 -
There is the operator "Model Simulator" that does exactly that, Automodel is not necessary. In fact, it is used by Automodel if you take a look at the underlying process (no black boxes indeed ).Regards,Sebastian1
Answers
-
Hi @asav_yu, have you run the model simulator in Auto Model? If you try "simulator" operator on your random forest trees, it will show you how the input "customer income" would affect the prediction interactively..
https://rapidminer.com/products/auto-model/<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="34"> <parameter key="repository_entry" value="//Samples/data/Iris"/> </operator> <operator activated="true" class="split_data" compatibility="9.1.000" expanded="true" height="103" name="Split Data" width="90" x="246" y="238"> <enumeration key="partitions"> <parameter key="ratio" value="0.6"/> <parameter key="ratio" value="0.4"/> </enumeration> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.1.000" expanded="true" height="103" name="Random Forest" width="90" x="380" y="34"> <parameter key="number_of_trees" value="100"/> <parameter key="criterion" value="gain_ratio"/> <parameter key="maximal_depth" value="10"/> <parameter key="apply_pruning" value="false"/> <parameter key="confidence" value="0.1"/> <parameter key="apply_prepruning" value="false"/> <parameter key="minimal_gain" value="0.01"/> <parameter key="minimal_leaf_size" value="2"/> <parameter key="minimal_size_for_split" value="4"/> <parameter key="number_of_prepruning_alternatives" value="3"/> <parameter key="random_splits" value="false"/> <parameter key="guess_subset_ratio" value="true"/> <parameter key="subset_ratio" value="0.2"/> <parameter key="voting_strategy" value="confidence vote"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="380" y="238"/> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="715" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="model_simulator:model_simulator" compatibility="9.1.000" expanded="true" height="103" name="Model Simulator" width="90" x="782" y="238"/> <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/> <connect from_op="Split Data" from_port="partition 1" to_op="Random Forest" to_port="training set"/> <connect from_op="Split Data" from_port="partition 2" to_op="Multiply" to_port="input"/> <connect from_op="Random Forest" from_port="model" to_op="Apply Model" to_port="model"/> <connect from_op="Random Forest" from_port="exampleSet" to_op="Model Simulator" to_port="training data"/> <connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Multiply" from_port="output 2" to_op="Model Simulator" to_port="test data"/> <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/> <connect from_op="Apply Model" from_port="model" to_op="Model Simulator" to_port="model"/> <connect from_op="Model Simulator" from_port="simulator output" to_port="result 2"/> <connect from_op="Model Simulator" from_port="model output" to_port="result 3"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> </process> </operator> </process>
6 -
If you don't have access to AutoModel, just try to apply your model on some variants of your data where you have different values for this attribute. You'll then see the effect of increasing and decreasing numbers in the results.
0 -
You can also use the Explain Predictions operator to do the same thing even if you don't have access to Automodel.
5 -
There is the operator "Model Simulator" that does exactly that, Automodel is not necessary. In fact, it is used by Automodel if you take a look at the underlying process (no black boxes indeed ).Regards,Sebastian1