test and train data set
should i make two data sets if i want to use algorithms ..and if i want to make dataset on my own should i create a single excel file or two excel files having one of them as training dataset and the other one as test data set and what difference should i keep in training dataset and the test dataset if these are two different files
Answers
-
Hi @abeetbhat1995,
1.You can create :
- one excel file with the training set in the sheet n°1 and the test set in the sheet n°2 (in this case in the 2 Read Excel operators,
don't forget to specify the number of the sheet).
or
- two excel files (one for the training set and the second for the test set)
2. Your training set and test set have to contain the same attributes and your training set have to contain the label in addition.
Example :
training set : test set :
Att1 Att2 Att3 label Att1 Att2 Att3
a b c 2 z y x
j k l 3 t u v
m n o 4 g h i
3. an example of simple fictive process :
Regards,
Lionel
1 -
You may want to look at the training video series on modeling and validation on this page: https://rapidminer.com/training/videos/
RapidMiner has a lot of built-in functionality around model validation that you should take advantage of. Cross-validation in particular is an approach that is considered "best practice" and should be part of your workflow. It does not require you to split your labeled data into separate training and testing sets.
1