How to use standardization /normalization correctly on test/Train data set?
hi,
I read that norm./standardization should be applied to train set separately, then the preprocessing model of the normalization/std. should be applied to the test data set,
but what about the validation set if I am doing cross-validation? should I also do a separate inner X-Validation normalization, where I apply the ranges of norm. from testdata in the XVal-set onto the validation set from the X-Validation?
For now, my process looks like this:
I use once normalization on the outside "big" process, but inside the grid optimizer, I have a X-Validation with an SVM inside, however, I Am not applying further normalization on there, now my Question is, would it be better if my process looked like this:
where I also apply normalization to the inner X-Validation validation data (or is it called the test-data?) and if so, what about the normalization of the outside big process, how should I use that normalization for my test-data on the outside, without already using it for the traindata set for X-Validation?
last question:
some people say (including my supervisor) that the test-data inside cross validation is called test-data, not validation data, and that validation data is the separate data tested outside that is entirely independent from the other X-Validation datasets. Is it not the other way around?