How to convert numerical values in result file back to original nominal values of input

Hung_Bui_221
Hung_Bui_221 New Altair Community Member
edited November 5 in Community Q&A
Hello everyone! I am just a beginner whom have just started to study RM for a few months. I am having a group problem to detect the outliers of Bank Marketing Dataset. This is my process (image below).

The dataset has more than 40.000 examples and Outlier Detection Operator seems too slow for both Nominal and Numerical values so I decided to change all of Nominal values into Numerical.

After running this process, I obtained result file and I would like to convert all of the Numerical values that I changed before back to Original Nominal values like the input file. Manual converting is absolutely the last choice but I wonder if I can do it as fast as possible by using the operators of RM or something else.

Please help me to find out the best way for this case asap  :# Thank you very much.

Best Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Do you have an ID in your data? If not, you can also use the Generate ID operator to get one. Then you use Join to get back the original data and add the generated outlier score to that.

    By the way, Local Outlier Factor is a nearest neighbor-based method, so it works best with normalized input. Use the Normalize operator before applying it, you should get better results with that. The join-based method for getting the original data is applicable there, too.

    Regards,
    Balázs
  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Normalizing changes all numeric attributes to be roughly between 0 and 1 (or -1 and 1), depending on the method.

    Nearest-neighbors methods compare values of different attributes with each other. This means that an attribute with large numerical values (e. g. money amounts) would dominate all the other attributes (age in years, 0/1 in nominal to numerical transformation etc.) and determine the neighborhood alone. Normalizing avoids this and gives all attributes a better chance to determine the distance calculations.

    Regards,
    Balázs

Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Do you have an ID in your data? If not, you can also use the Generate ID operator to get one. Then you use Join to get back the original data and add the generated outlier score to that.

    By the way, Local Outlier Factor is a nearest neighbor-based method, so it works best with normalized input. Use the Normalize operator before applying it, you should get better results with that. The join-based method for getting the original data is applicable there, too.

    Regards,
    Balázs
  • Hung_Bui_221
    Hung_Bui_221 New Altair Community Member
    Thank you so much for replying me. Your answer is really helpful for me. Can I ask you one more question?

    After I used Normalize Operator for all attributes, the datatype and the values was changed. Such as Age, first this attribute contained the age of customers (40, 50, 60 years old...), but then the datatype and the values was changed into real (attached image).

    I wonder if this affects the result.  :# Please tell me more. Thank you again.

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Normalizing changes all numeric attributes to be roughly between 0 and 1 (or -1 and 1), depending on the method.

    Nearest-neighbors methods compare values of different attributes with each other. This means that an attribute with large numerical values (e. g. money amounts) would dominate all the other attributes (age in years, 0/1 in nominal to numerical transformation etc.) and determine the neighborhood alone. Normalizing avoids this and gives all attributes a better chance to determine the distance calculations.

    Regards,
    Balázs
  • Hung_Bui_221
    Hung_Bui_221 New Altair Community Member
    Thank you so much, Mr.Balázs.  o:) Your answer is really great.