Beginner question regarding train / test set

Question

First of all: I am a total beginner in data science. For my university project, I need to create a process in rapidminer which predicts a customer satisfaction based on a survey. The dataset can be obtained from kaggle by searching for "Airline Passenger Satisfaction" by TJ Klein (cannot post links yet). I get a train and a test set. I build my process based on the train set. so currently my process looks like this: The thing that now confuses me is, where do I use my test set? I don't really now where and I should use it - if I should use it at all. The test set is not unlabeled btw. As it says on kaggle, it was just splitted from the train set and represents 20% of all data.

Caperez · Answer

Hi @Bella0812,
The cross-validation operator performed the tests as mentioned above. In this case, you can use other data sets for validation purposes by using the Apply Model operator.

Best, 
Cesar

Bella0812 · Answer

Thanks for your answer @ceaperez !

I know how the cross validator works, and thats why I am confused. Do I still need to use the test set which I got in a seperate file, or can i ignore it as the cross validator already did the testing?

Regards

Caperez · Answer

Hi @Bella0812,

You are using the Cross-validation operator in your model. 
This operator performs the training and validation process in you. Basically, the operator divides the data set into k subsets of equal size, then the operator retains one subset and trains the model on the other k-1 subsets. the process is repeated k times, with a different test subset selected each time.

best,

Cesar