How different between declare missing value and replace missing value ?

supremerry
supremerry New Altair Community Member
edited November 5 in Community Q&A
The instructure told me to replace missing value by 0. By the way, he told me to use declare missing value operator. I wonder why don't we use replace missing value operator... 

Best Answers

  • YYH
    YYH
    Altair Employee
    edited April 2021 Answer ✓
    Hi @supremerry,

    These two operators have different use. I usually use the declare before replace. But it depends..

    In RapidMiner, missing values are represented by question mark (?). Saying your age attribute was recorded as -1 or 9999 for missing age, you will "declare" that -1 or 9999 in this column should be treated as missing, not a valid age. If you use "replace missing value" on the missing age before declare it, it will treat -1, 9999 as non-missing values in age attribute. Because "replace missing value" will only act on the question marked (?) data.

    It depends on how dirty is the input data. I always explore the data to check any special coding of missing values before handling missing values. Sometimes the invalid data is automatically recognized as missing (represented as "?") during loading step. But sometimes not. If the data is coming from another platform, the missing values are not properly declared, e.g. in SAS, numeric missing values are represented by a single period (.), and character missing values are represented by a single blank enclosed in quotes (' '). While in R, NA is for missing values; in python NaN is for missing.

    HTH!

    YY


  • Caperez
    Caperez Altair Community Member
    Answer ✓
    Hi @supremerry

    There are a few differences between operators.
    by one side, the Replace Missing Operator, replace the missing value with a short list of options like medium, maximum zero, etc.
    By other hand, the Declare Missing Values Operator is more flexible and complete, allowing you to replace a a missing value for a numeric value, nominal value, or for an expression, using the powerful expression editor from Rapidminer. 
    Depending on what do you want to do with your missing values and the model that you want construct and the role of the missing values in your model, you may select one or another

    Best

Answers

  • YYH
    YYH
    Altair Employee
    edited April 2021 Answer ✓
    Hi @supremerry,

    These two operators have different use. I usually use the declare before replace. But it depends..

    In RapidMiner, missing values are represented by question mark (?). Saying your age attribute was recorded as -1 or 9999 for missing age, you will "declare" that -1 or 9999 in this column should be treated as missing, not a valid age. If you use "replace missing value" on the missing age before declare it, it will treat -1, 9999 as non-missing values in age attribute. Because "replace missing value" will only act on the question marked (?) data.

    It depends on how dirty is the input data. I always explore the data to check any special coding of missing values before handling missing values. Sometimes the invalid data is automatically recognized as missing (represented as "?") during loading step. But sometimes not. If the data is coming from another platform, the missing values are not properly declared, e.g. in SAS, numeric missing values are represented by a single period (.), and character missing values are represented by a single blank enclosed in quotes (' '). While in R, NA is for missing values; in python NaN is for missing.

    HTH!

    YY


  • Caperez
    Caperez Altair Community Member
    Answer ✓
    Hi @supremerry

    There are a few differences between operators.
    by one side, the Replace Missing Operator, replace the missing value with a short list of options like medium, maximum zero, etc.
    By other hand, the Declare Missing Values Operator is more flexible and complete, allowing you to replace a a missing value for a numeric value, nominal value, or for an expression, using the powerful expression editor from Rapidminer. 
    Depending on what do you want to do with your missing values and the model that you want construct and the role of the missing values in your model, you may select one or another

    Best