ModelApplier on multiple Models
Legacy User
New Altair Community Member
Hi,
I built up a DecisionTree-Model on a Training-Dataset. The Validation is done by a XValidation. After writing down the model I run it over the Test-Dataset with the ModelApplier. This whole Processchain runs perfectly. But: My idea is to find the model which perfectly fits to the Test-Dataset. So I like to build up multiple models by implementing Bagging and evaluate them with the Testdata. The problem is that the ModelApplier can only handle one Model. Do you see an option to run multiple models on a Test-Dataset and evaluate them by ClassificationPerformance?
Regards,
Thorsten
I built up a DecisionTree-Model on a Training-Dataset. The Validation is done by a XValidation. After writing down the model I run it over the Test-Dataset with the ModelApplier. This whole Processchain runs perfectly. But: My idea is to find the model which perfectly fits to the Test-Dataset. So I like to build up multiple models by implementing Bagging and evaluate them with the Testdata. The problem is that the ModelApplier can only handle one Model. Do you see an option to run multiple models on a Test-Dataset and evaluate them by ClassificationPerformance?
Regards,
Thorsten
Tagged:
0
Answers
-
Hi Thorsten,
you could use the %{a}-macro together with an IteratingOperatorChain like it is described in this posting:
http://rapid-i.com/rapidforum/index.php/topic,32.0.html
This should also work in combination with a ClassificationPerformance evaluator - at least to manually check which model is the best one. For an completely automated selection, this would need a little amount of coding...
By the way: are you sure that it is a good idea to select the best model on the test set? This is actually like overfitting but now not on the training but on the test set. In general, I would always suggest to use all data for model building and use a validation scheme like cross validation for performance evaluation only but not for model selection...
Cheers,
Ingo0 -
Thanks for your fast reply and ideas. Using the IteratingOperatorChain is agood idea, but I still have problems with the ModelApplier. I thought if I set the iteration value to ten the ModelApplier will calculate ten predictions by using ten different models derived by Bagging.
<operator name="Root" class="Process" expanded="yes">
I also used the hole dataset for calculating a model. This is surely the best solution for dicriminating all the classes for this dataset. But in my case the model gets quite too complex because of high variations within the classes. So I would like to have an easier one which could also be used for similiar data.
<operator name="Trainingsdaten einlesen" class="ExampleSource">
</operator>
<operator name="Herausfiltern von Feature A" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="A"/>
</operator>
<operator name="Feature AdvRatios herausfiltern" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="AdvRatios"/>
</operator>
<operator name="Kreuzvalidierung" class="XValidation" expanded="yes">
<parameter key="create_complete_model" value="true"/>
<parameter key="keep_example_set" value="true"/>
<parameter key="leave_one_out" value="true"/>
<operator name="Modell lernen" class="OperatorChain" expanded="yes">
<operator name="Bagging" class="Bagging" expanded="yes">
<operator name="DecisionTree" class="DecisionTree">
</operator>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="model_%{a}.mod"/>
<parameter key="output_type" value="XML"/>
</operator>
</operator>
<operator name="Modell testen und bewerten" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Bewertung des Modells" class="ClassificationPerformance">
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
</operator>
<operator name="Testdaten vorbereiten" class="OperatorChain" expanded="yes">
<operator name="Testdaten einlesen" class="ExampleSource">
</operator>
<operator name="Herausfiltern von Feature A (2)" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="A"/>
</operator>
<operator name="Feature AdvRatios herausfiltern (2)" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="AdvRatios"/>
</operator>
</operator>
<operator name="IteratingOperatorChain" class="IteratingOperatorChain" expanded="yes">
<parameter key="iterations" value="10"/>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="model_%{a}.mod"/>
</operator>
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="keep_model" value="true"/>
</operator>
<operator name="Klassifikationssicherheit Test-/Trainingsdaten" class="ClassificationPerformance">
<parameter key="accuracy" value="true"/>
<list key="class_weights">
</list>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="true"/>
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
</operator>
Thanks so far,
Thorsten0 -
Hi again,
You will not see the ten predictions from bagging since the ten base models are included in the bagging model and taken into account by the overlying model. I am not sure but you can probably "simulate" Bagging with a combination of the IteratingOperatorChain, a sampling operator, and a learner. Then you will get 10 models and can apply them alone.Using the IteratingOperatorChain is agood idea, but I still have problems with the ModelApplier. I thought if I set the iteration value to ten the ModelApplier will calculate ten predictions by using ten different models derived by Bagging.
Cheers,
Ingo0