"Performance and normalization"

jiri
jiri New Altair Community Member
edited November 5 in Community Q&A
I have question regarding normalization of data and impact on performance (prediction).
I normalize sample attribute values (Z-transfromation, Portion or Range) and than use slidingwindow validation (x,1,1,1) with OneR classifier.
Normalization positively improves performence but I´m not sure if it is correct and normalization of whole dataset somehow project future values to the past .

Question: can I apply normalization on whole sample (dataset) in case of slidingwindow? 

      ExampleSet
      Normalization ? (Z-Transformation)
      SlidingWindow  (window,1,1)
          OneR
          ModelApplier
          Performance 

Answers

  • haddock
    haddock New Altair Community Member
    Hi Jiri,

    I think you're right, the normalisation, because it runs over all the examples, is data-snooping, and it is quite a fiddle to get it right ( you have to repeatedly make/save/apply the normalisation model - a quick search on this forum will find some code ). On the other hand, well done for spotting the danger; I take it you've already searched this forum on sliding window validation - I've warned elsewhere of the data-snooping risks it entails.