It's important to keep both labels as special roles (you can name them how ever you like), and before using a learner, just set the role of the label you want to predict to "label". After the apply model, set the prediction attribute to another special role because otherwise it might be discarded for the second model. I hope the process helps to understand.
2) Both preprocessing steps make sense. Remember that normalization creates a preprocessing model that needs to be grouped with the learner model. I'm not sure if you have to remove correlated attributes, but there is an operator for that.
3) Both split validation and cross validation can be used. The data will be split into training and test sets accordingly, and the performance will be measured/validated over all splits. If you connect the model output port, it will generate the model over all examples.
6) Usually, models can be stored and used in another process so long as the new data has the same format for the regular attributes as the data you trained it on. It should not be different for the DL model.
7) Yes, you can skip the set roles during the configuration and just set the roles with the corresponding operator. Remember that up until then, all attributes are considered as regular and will be used as such e.g. when using a learner.
Thank you for your quick reply. Regarding your reply:
2) 'Remember that normalization creates a preprocessing model that needs to be grouped with the learner model' -> Yes i noticed that certain operators generate a 'preprocessing model'. I am not sure what this is, and what does it mean that it need to be grouped with the learning model. Do you mean to group the normalization operator followed by the learning model (e.g. Neural Net) both within the training process?
3) I am assume Cross Validation would be the more recommended one?
2) Yes, that's what I meant. You should put the normalization in the traiing process and use the Group Model operator to combine the normalization model and learner model. This way the test data will be normalized the same way as the training data.
3) Cross Validation does several validations based on the number of folds you want to run. Split validation can be combined with a Loop operator around it to do several different validations if the sampling method uses a random element (i.e. shuffled or stratified sampling).
1) Just wondering, why should I not add the Normalize operator just before the Cross Validation operator (i.e. not nested within the Training process)?
2) Also, after training the model within the Optimize Parameters (Grid) operator, how could I use the model for prediction on new data? i.e. what to connect to the new data? I only see 'per' , 'par', 'res' from the output of the optimize parameters operator. How could I connect the Apply Model and where?
3) One small question: For regression task, do I set the role of my target attribute (which is a numerical value) as 'label' as well? Do I not select the 'prediction' role?
1) When doing training and testing in a validation operator, you want to put your normalization or other operator that affects the training row inside the validation operator. If you keep it outside you can leak information (i.e. data snooping) into your test set, which can affect the overall accuracy of your model.
2) Output the RES port to an Apply Model operator. The model that is delievered from the Optimize Parameter is the optimized model.
Thank you for the concise clarification, Thomas. I have a few queries about Normalization, if you don't mind :
1) I would assume you mean something like the process shown below. If I put the normalization on the training data, would the test data be similarly normalized?
2) What is the purpose of the Model output that is delivered by the Normalize operator? In what situation is it actually used?
3) Assuming my purpose is for a NN prediction via Deep Learning, where should I use the De-normalize operator? I would like my output (i.e. prediction) notto be a normalized value. I noticed that De-normalize is typically connected to the output 'pre' of the Normalize operator, I am confused by what this does - isn't it simply negating the effect of the Normalize operator?
4) For the Cross Validation, is the 'mod' output from the Apply Model operator necessary to be connected to something (I am not sure what) for the Cross Validation to deliver the 'mod' output?
5) In the Deep Learning operator, there is the option to 'standardize' -> could this be an inbuilt normalization parameter?
Hi Jan
Thank you for your quick reply. Regarding your reply:
2) 'Remember that normalization creates a preprocessing model that needs to be grouped with the learner model' -> Yes i noticed that certain operators generate a 'preprocessing model'. I am not sure what this is, and what does it mean that it need to be grouped with the learning model. Do you mean to group the normalization operator followed by the learning model (e.g. Neural Net) both within the training process?
3) I am assume Cross Validation would be the more recommended one?
Regards
Corse