Find more posts tagged with
Sort by:
1 - 4 of
41
I think cross-validation is the standard baseline approach to measuring your performance. As @mschmitz says, a bit of overfitting is almost inevitable with any ML algorithm. The question isn't really "is this model overfit" as much as it is "how will this model perform on unseen data"? Cross-validation is the best way to answer that question while still making use of the most possible information in both training and testing your model.
Hi @Curious
To add up to previous answers: use common sense
If on a test set you got an error of 0.001 or AUC = 99.95, then something is certainly wrong. Any 'too good to be true' result may generally indicate overfitting. Also, use correlation matrix to see if some attributes correlate too high with the label.
To add up to previous answers: use common sense

If on a test set you got an error of 0.001 or AUC = 99.95, then something is certainly wrong. Any 'too good to be true' result may generally indicate overfitting. Also, use correlation matrix to see if some attributes correlate too high with the label.
Nevertheless, this is a good question that everyone will have to deal with at some point. The best procedure is to establish a baseline. 1 - Build your forecast using a default model. 2 - Determine which learner is most suitable for the data and 3- Forward test on unseen data. Not spending enough time on 1 and 3 is where most people go wrong.