A program to recognize and reward our most engaged community members
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.0"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process"> <process expanded="true" height="557" width="614"> <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="Sonar data set" width="90" x="45" y="30"> <parameter key="repository_entry" value="//Samples/data/Sonar"/> </operator> <operator activated="true" class="multiply" compatibility="5.0.10" expanded="true" height="130" name="Multiply" width="90" x="45" y="210"/> <operator activated="true" class="x_validation" compatibility="5.0.10" expanded="true" height="112" name="Decision Tree (2)" width="90" x="179" y="390"> <description>A cross-validation evaluating a decision tree model.</description> <process expanded="true" height="549" width="310"> <operator activated="true" class="decision_tree" compatibility="5.0.10" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30"/> <connect from_port="training" to_op="Decision Tree" to_port="training set"/> <connect from_op="Decision Tree" from_port="model" to_port="model"/> <portSpacing port="source_training" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true" height="549" width="310"> <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model (3)" width="90" x="45" y="30"> <list key="application_parameters"/> </operator> <operator activated="true" class="performance" compatibility="5.0.10" expanded="true" height="76" name="Performance (Decision Tree)" width="90" x="179" y="30"/> <connect from_port="model" to_op="Apply Model (3)" to_port="model"/> <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/> <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (Decision Tree)" to_port="labelled data"/> <connect from_op="Performance (Decision Tree)" from_port="performance" to_port="averagable 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_averagable 1" spacing="0"/> <portSpacing port="sink_averagable 2" spacing="0"/> </process> </operator> <operator activated="true" class="x_validation" compatibility="5.0.10" expanded="true" height="112" name="Naive Bayes" width="90" x="179" y="255"> <description>A cross-validation evaluating a decision tree model.</description> <process expanded="true" height="396" width="301"> <operator activated="true" class="naive_bayes_kernel" compatibility="5.0.10" expanded="true" height="76" name="Naive Bayes (Kernel)" width="90" x="110" y="30"/> <connect from_port="training" to_op="Naive Bayes (Kernel)" to_port="training set"/> <connect from_op="Naive Bayes (Kernel)" from_port="model" to_port="model"/> <portSpacing port="source_training" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true" height="396" width="301"> <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30"> <list key="application_parameters"/> </operator> <operator activated="true" class="performance" compatibility="5.0.10" expanded="true" height="76" name="Performance (Naive Bayes)" width="90" x="179" y="30"/> <connect from_port="model" to_op="Apply Model (2)" to_port="model"/> <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (Naive Bayes)" to_port="labelled data"/> <connect from_op="Performance (Naive Bayes)" from_port="performance" to_port="averagable 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_averagable 1" spacing="0"/> <portSpacing port="sink_averagable 2" spacing="0"/> </process> </operator> <operator activated="true" class="x_validation" compatibility="5.0.0" expanded="true" height="112" name="KNN" width="90" x="179" y="120"> <description>A cross-validation evaluating a decision tree model.</description> <process expanded="true" height="654" width="466"> <operator activated="true" class="k_nn" compatibility="5.0.10" expanded="true" height="76" name="k-NN" width="90" x="179" y="30"/> <connect from_port="training" to_op="k-NN" to_port="training set"/> <connect from_op="k-NN" from_port="model" to_port="model"/> <portSpacing port="source_training" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true" height="654" width="466"> <operator activated="true" class="apply_model" compatibility="5.0.0" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30"> <list key="application_parameters"/> </operator> <operator activated="true" class="performance" compatibility="5.0.0" expanded="true" height="76" name="Performance (KNN)" width="90" x="179" y="30"/> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (KNN)" to_port="labelled data"/> <connect from_op="Performance (KNN)" from_port="performance" to_port="averagable 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_averagable 1" spacing="0"/> <portSpacing port="sink_averagable 2" spacing="0"/> </process> </operator> <operator activated="true" class="paren:landmarking" compatibility="5.0.0" expanded="true" height="60" name="LandMarking" width="90" x="179" y="30"> <parameter key="Linear Discriminant" value="false"/> <parameter key="Cross-validation" value="true"/> <parameter key="Normalize Dataset" value="false"/> </operator> <connect from_op="Sonar data set" from_port="output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="LandMarking" to_port="exampleset"/> <connect from_op="Multiply" from_port="output 2" to_op="KNN" to_port="training"/> <connect from_op="Multiply" from_port="output 3" to_op="Naive Bayes" to_port="training"/> <connect from_op="Multiply" from_port="output 4" to_op="Decision Tree (2)" to_port="training"/> <connect from_op="Decision Tree (2)" from_port="averagable 1" to_port="result 4"/> <connect from_op="Naive Bayes" from_port="averagable 1" to_port="result 3"/> <connect from_op="KNN" from_port="averagable 1" to_port="result 2"/> <connect from_op="LandMarking" from_port="exampleset" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> <portSpacing port="sink_result 5" spacing="0"/> </process> </operator></process>
It is a great and very useful initiative to provide such an extension as PaREn. This kind of feature is included in other major DM software, so it was time. Many thanks to the PaREn team!
Not sure if the current order, in which the figures are, is statistically significant, but anyway one would normally expect the PaREn optimised classifier to outperform both the subsequent DT and the trivial model blindly predicting the most frequent class.
could the PaREn team tell us whether they made use of the ROC analysis implemented in RM, among others, to optimise accuracy?
Thanks for your encouraging remarks. Can you please point to some DM software that has similar functionality?
A similar (though not identical) feature, very effective indeed, is offered by IBM SPSS Modeler for instance as an automatic modeling operator, via which several models are produced automatically, and the best of them are proposed to the user. Moreover, the models may be combined to produce a kind of voting model, which may have better performance in some occasions than the individual models. See a demo here.
whoa, but that's quite a difference: in SPSS all models are actually tested (which can also be done with the PaREn extension during the evaluation step but is also possible with a simple process for core RapidMiner as Simon has pointed out).The cool thing about the PaREn extension is that it predicts which model is probably the best even without any testing. This is the first time I have actually see this meta learning approach really working and this is probably the reason why we at Rapid-I and many others love it. Kudos to the Christian and the team of the DFKI for this great extension!
AUC calculation needs corrections, as I have shown on the forum
The cool thing about the PaREn extension is that it predicts which model is probably the best even without any testing.