W-M5P
islem_h
New Altair Community Member
Hello everyone,
I am using the W-M5P from the weka extension to get a model tree and be able to extract rules. However, I always get only one rule (shown in the screenshot) and also I don't get the graph of the tree at all unlike with the rapidminer regression tree or random forest.
It would be great if anyone could help here
Thank you in advance!
I am using the W-M5P from the weka extension to get a model tree and be able to extract rules. However, I always get only one rule (shown in the screenshot) and also I don't get the graph of the tree at all unlike with the rapidminer regression tree or random forest.
It would be great if anyone could help here
Thank you in advance!
0
Best Answer
-
Hello @islem_h
I have gone through your process and I don't fing any issues with your process. I have gone through Ross Quinlan paper on M5 algorithms and observed that this is the way this algorithm works. Actually, M5 is Model-based tree building algorithm in contrast to traditional regression-based tree building methods. This builds models at the leaves rathen than placing values like decision tree. One of the major capabilities of M5 is that it can remove variables based on a greedy approach and some times all of them. Also, the pruning strategy is different. I attached the link to this algorithm you can go through this to understand.M5 uses a greedy search to remove variables that contribute little to the model; in some cases, m5 removes all variables, leaving only a constant
The algorithm is generating a single linear equation based on your data. When you compare this with the Decision tree algorithm, then you see differences, this is because they work on regression rules. If you want W-M5 to build a regression-based tree model, then you can select the option "R" in the parameters of W-M5P, this gives you are regular regression based tree.
https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf
3
Answers
-
Hello @islem_h
I tried with the polynomial dataset in RapidMiner Samples. I can get multiple rules using W-M5P or W-M5Rules operators from weka extension. I can also generate the tree (Screenshots below). It mainly depends on your data. If you can provide your XML process and dataset we can try to replicate and see what is happening. Sample code used attached in this as well.<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="85"> <parameter key="repository_entry" value="//Samples/data/Polynomial"/> </operator> <operator activated="true" class="split_data" compatibility="9.2.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="85"> <enumeration key="partitions"> <parameter key="ratio" value="0.7"/> <parameter key="ratio" value="0.3"/> </enumeration> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="false" class="weka:W-M5P" compatibility="7.3.000" expanded="true" height="82" name="W-M5P" width="90" x="380" y="442"> <parameter key="N" value="false"/> <parameter key="U" value="false"/> <parameter key="R" value="false"/> <parameter key="M" value="4.0"/> <parameter key="L" value="false"/> </operator> <operator activated="true" class="weka:W-M5P" compatibility="7.3.000" expanded="true" height="82" name="W-M5P (2)" width="90" x="313" y="34"> <parameter key="N" value="false"/> <parameter key="U" value="false"/> <parameter key="R" value="false"/> <parameter key="M" value="4.0"/> <parameter key="L" value="false"/> </operator> <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" width="90" x="447" y="136"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_regression" compatibility="9.2.001" expanded="true" height="82" name="Performance" width="90" x="514" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="root_mean_squared_error" value="true"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="prediction_average" value="false"/> <parameter key="spearman_rho" value="true"/> <parameter key="kendall_tau" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> </operator> <connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/> <connect from_op="Split Data" from_port="partition 1" to_op="W-M5P (2)" to_port="training set"/> <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="W-M5P (2)" from_port="model" to_op="Apply Model" to_port="model"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/> <connect from_op="Apply Model" from_port="model" to_port="result 2"/> <connect from_op="Performance" from_port="performance" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator> </process>
Thanks for your understanding.
1 -
Hello @islem_h
I have gone through your process and I don't fing any issues with your process. I have gone through Ross Quinlan paper on M5 algorithms and observed that this is the way this algorithm works. Actually, M5 is Model-based tree building algorithm in contrast to traditional regression-based tree building methods. This builds models at the leaves rathen than placing values like decision tree. One of the major capabilities of M5 is that it can remove variables based on a greedy approach and some times all of them. Also, the pruning strategy is different. I attached the link to this algorithm you can go through this to understand.M5 uses a greedy search to remove variables that contribute little to the model; in some cases, m5 removes all variables, leaving only a constant
The algorithm is generating a single linear equation based on your data. When you compare this with the Decision tree algorithm, then you see differences, this is because they work on regression rules. If you want W-M5 to build a regression-based tree model, then you can select the option "R" in the parameters of W-M5P, this gives you are regular regression based tree.
https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf
3