Apply Model in RapidMiner
Dear all,
I split a dataset 75%(Training) and 25%(Testing). I pre-processed the data on training data. Then, I split the training data again (75%-25%) and perform Naive Bayes Classification. I saved the model as a result of training data.
However, if I want to test whether my model is applicable or not, I believe I need to run my model on the 25% which I kept as the test data initially.
Means, for 25% of initial testing data, I will perform pre-processing same as training, then, I retrieve the pre-processed test data and apply the model which I saved from the training data.
Could you please advise if this is correct or am I doing wrong here?
thanks.
Heikoe
I split a dataset 75%(Training) and 25%(Testing). I pre-processed the data on training data. Then, I split the training data again (75%-25%) and perform Naive Bayes Classification. I saved the model as a result of training data.
However, if I want to test whether my model is applicable or not, I believe I need to run my model on the 25% which I kept as the test data initially.
Means, for 25% of initial testing data, I will perform pre-processing same as training, then, I retrieve the pre-processed test data and apply the model which I saved from the training data.
Could you please advise if this is correct or am I doing wrong here?
thanks.
Heikoe
Find more posts tagged with
Sort by:
1 - 5 of
51
HeikoeWin786
Hello
Most of the people have this question when they start working with data with RM or other software.
I agree with @Telcontar120, also I recommend you to read this link.
https://community.rapidminer.com/discussion/54621/cross-validation-and-its-outputs-in-rm-studio#latest
Best
Sara
Hello
Most of the people have this question when they start working with data with RM or other software.
I agree with @Telcontar120, also I recommend you to read this link.

https://community.rapidminer.com/discussion/54621/cross-validation-and-its-outputs-in-rm-studio#latest
Best
Sara
@Telcontar120 Thanks for your kind input. Just one more thing, I did run the cross-validation and NBC (on the same training dataset), but I get the output result (performance matrix) as the same result for both NBC and cross-validation. Is it normal? I expected different result actually.
Also, for the test dataset, I should retrieve the test dataset and use the "saved model" to test the data, correct?
I mean, how to design in RM for the test dataset? (i.e. the 25% initial unlabel data).
thanks much for your kind explanation.
Regards,
Heikoe
Also, for the test dataset, I should retrieve the test dataset and use the "saved model" to test the data, correct?
I mean, how to design in RM for the test dataset? (i.e. the 25% initial unlabel data).
thanks much for your kind explanation.
Regards,
Heikoe
Sort by:
1 - 1 of
11
That would be correct, if you want to run split validation manually, which it sounds like you did.
You could also use the Split Validation operator, which does this automatically and delivers the performance on the test portion of the data.
But I would really recommend using Cross Validation instead, which will train and test on multiple subsets of your original data and supply the resulting validation performance automatically.
You could also use the Split Validation operator, which does this automatically and delivers the performance on the test portion of the data.
But I would really recommend using Cross Validation instead, which will train and test on multiple subsets of your original data and supply the resulting validation performance automatically.
You could also use the Split Validation operator, which does this automatically and delivers the performance on the test portion of the data.
But I would really recommend using Cross Validation instead, which will train and test on multiple subsets of your original data and supply the resulting validation performance automatically.