What Benefits does Normalisation offer?

Madcap
Madcap New Altair Community Member
edited November 5 in Community Q&A
Hi, I understand that when normalising my data it puts values into a specific range.
I know that this can help for machine learning purposes but I'm unclear on how? 

Would someone mind clearing this up for me?
Thanks again
-Madcap

Best Answers

  • varunm1
    varunm1 New Altair Community Member
    edited March 2019 Answer ✓
    Hello @Madcap

    Normalizing puts values into a specific range, True. Actually, it keeps all the predictor's values in the same range for example 0 to 1. ML and statistic models consider that data is distributed normally. The main use of normalization is when we have predictors (Attributes) whose scales (Ranges) vary a lot. For example, If we have an attribute that has values between 0 to 10 and another attribute that has values between 1000 and 10000 in the dataset this causes the algorithm to think that the attribute with higher values (1000 to 10000) is a supporting predictor. This might not be true in reality. For this reason, we consider normalization so that all the attributes are normally distributed during training and based on the statistical significance they will be given priority. This will support stable convergence of an algorithm
  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Just to add to the great explanation of @varunm1: Normalization is especially important for all distance-based learners like k-NN.  Without normalization, attributes with a very large range would simply overwhelm other attributes with smaller ranges.  Not because they are actually more important as predictors, simply because they have a bigger range.  For other learning schemes, e.g. Decision Tree, this does not matter and in fact I would recommend against normalization (in most cases), since it changes the range of your input data and reduces understandability of the model to somebody who is familiar with the application domain.
    Hope this helps,
    Ingo

Answers

  • varunm1
    varunm1 New Altair Community Member
    edited March 2019 Answer ✓
    Hello @Madcap

    Normalizing puts values into a specific range, True. Actually, it keeps all the predictor's values in the same range for example 0 to 1. ML and statistic models consider that data is distributed normally. The main use of normalization is when we have predictors (Attributes) whose scales (Ranges) vary a lot. For example, If we have an attribute that has values between 0 to 10 and another attribute that has values between 1000 and 10000 in the dataset this causes the algorithm to think that the attribute with higher values (1000 to 10000) is a supporting predictor. This might not be true in reality. For this reason, we consider normalization so that all the attributes are normally distributed during training and based on the statistical significance they will be given priority. This will support stable convergence of an algorithm
  • Madcap
    Madcap New Altair Community Member
    Thanks a lot for your help on this and past questions! @varunm1
    Very helpful
  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Just to add to the great explanation of @varunm1: Normalization is especially important for all distance-based learners like k-NN.  Without normalization, attributes with a very large range would simply overwhelm other attributes with smaller ranges.  Not because they are actually more important as predictors, simply because they have a bigger range.  For other learning schemes, e.g. Decision Tree, this does not matter and in fact I would recommend against normalization (in most cases), since it changes the range of your input data and reduces understandability of the model to somebody who is familiar with the application domain.
    Hope this helps,
    Ingo
  • Madcap
    Madcap New Altair Community Member
    Thank @IngoRM that is helpful. I had been creating my decisions trees and rule models etc, with normalised data mainly because in the tutorials I had done it. I definitely understand the readability aspect of it as I found myself trying to uncover what a standardised value actually represented.

    Thanks again
    -Madcap
  • Telcontar120
    Telcontar120 New Altair Community Member
    Just a clarifying note that normalization doesn't actually change the distribution of the underlying variables itself, but it does change their range.  In spite of the name, normalization doesn't magically transform underlying data into a "normal" distribution.  So you still might need to worry about outlier detection and removal techniques depending on the actual data you are using.