What model should I use ( training, validation or testing )
I am seeking a little "best" advice on the live prediction model application, as I am a little confused what approach is normally adopted.
The data : My data set is 50 attributes and 3400 rows ( 90% for training, 10% for unseen testing) with the very last row reserved as the live prediction example.
The training : I use the 90% training data in 10 fold x-validation to find the best training algorithm and attribute mix for my data. Confirming the best setup selection by applying the model created on the 10% of unseen data.
My question is - Once I am happy with the above results, what model do I use ( or create ) for the live prediction of the last row? :
1) Do I use the best model created via 90% data 10 fold x-validation
2) Do I create a model with the 90% training data ( without x fold ) using the best settings found from the x-validation training.
3) Do I create a model on 100% data ( 90% training and 10% unseen ) with the best settings found from training.
Thank you in advance for your time.
The data : My data set is 50 attributes and 3400 rows ( 90% for training, 10% for unseen testing) with the very last row reserved as the live prediction example.
The training : I use the 90% training data in 10 fold x-validation to find the best training algorithm and attribute mix for my data. Confirming the best setup selection by applying the model created on the 10% of unseen data.
My question is - Once I am happy with the above results, what model do I use ( or create ) for the live prediction of the last row? :
1) Do I use the best model created via 90% data 10 fold x-validation
2) Do I create a model with the 90% training data ( without x fold ) using the best settings found from the x-validation training.
3) Do I create a model on 100% data ( 90% training and 10% unseen ) with the best settings found from training.
Thank you in advance for your time.
Find more posts tagged with
Sort by:
1 - 5 of
51
Hi,
I have a question. Do you apply the trained model with the model applier right after the XValidation or do you have to train again over the whole training set after having applied the XValidation? I am asking because in case you do a Feature selection with an inner XValidation, You don't get a model out of the feature selection (there is no connection point). However you could save the model with a "remember operator" inside the FS and call the model outside the FS operator and combine it with the feature weights operator for the unseen testset. But I think one has to retrain over the full training set with the selected features right?
I have a question. Do you apply the trained model with the model applier right after the XValidation or do you have to train again over the whole training set after having applied the XValidation? I am asking because in case you do a Feature selection with an inner XValidation, You don't get a model out of the feature selection (there is no connection point). However you could save the model with a "remember operator" inside the FS and call the model outside the FS operator and combine it with the feature weights operator for the unseen testset. But I think one has to retrain over the full training set with the selected features right?
With a large dataset you could go with (2) select based on training/test. You can do without X-validation here.
Whatever you pick Don't do (3) ever as you face the risk of over-fitting the data badly.
There are some authors who recommend splitting the dataset into training/test/validation. Train your models in the training set. Compare the models in the test set. Pick the best. Estimate the error rate of the best model again in the validation set.