Home
Discussions
Community Q&A
automodel in rapidminer
hoon
hi, rapidminer comunity, it is great and facinating to be introduced by auto model. how to identify overfitting model in rapidminer automodel?
Find more posts tagged with
AI Studio
Auto Model
Accepted answers
All comments
IngoRM
Hi,
Welcome to the Community :-)
The following discussion should be interesting for you:
https://community.rapidminer.com/discussion/comment/48190#Comment_48190
The short summary is that there always will be overfitting and you detect it in Auto Model the same way as you do in general: You measure the validation accuracy and if this is getting worse for more complex models you are in overfitting-land. There is a lot of things you can screw up in validation though (most people and tools do). Most frequent error is that people are only validation the actual machine learning model building but not the impact of data preprocessing. But rest assured that Auto Model is taking good care of all of that for you so the performance you see are true and correctly validated performances.
If you want to dive a bit deeper on the topic, I also would recommend this "little" white paper I wrote on the topic of correct validation some time ago:
https://rapidminer.com/resource/correct-model-validation/
My one-line-recommendation is to be less concerned about overfitting (it always happens!) and more concerned about correct validation since this guarantees that there are no negative surprises down the road.
Hope this and the links above help. Best,
Ingo
Telcontar120
Just to add a further comment here, if it were generally possible to identify specifically which parts of the model were only present from overfitting, then it would be easy to remove only those parts. But that's unfortunately not how it works :-)
As Ingo said, the main thing is to understand how the model is going to perform on unseen data in the future, which will include both the effects from the accurate capture of replicable patterns and relationships in the data that should be present in all samples as well as the effects from overfitting to the idiosyncrasies of your development dataset. As long as you are using correct validation, you will have a pretty good estimate of that overall performance but you won't be able to partition it out cleanly.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)