One-Hot Encoding Top 10 Items (Fractional) Rest Other

Zarrok
Zarrok New Altair Community Member
edited November 5 in Community Q&A
Hello together,

i am searching for a smart solution for One-Hot Encoding to the Top 10 (Fractional) Items. 
Currently I solve the problem by creating a new attribute for the top 10 values. For example:
  For each Attribute I need to generate a new Column:
if((contains([Attri],"Example Data")) ,1,0) 

Does anybody have a smart solution for this kind of issue ?

Kind regards,
ZaRRoK

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    likely just use Remove Rare Values first and then One Hot Encoding?
    BR,
    Martin
  • Zarrok
    Zarrok New Altair Community Member
    edited July 2022
    I understand what you mean, problem is rather that I have a large dataset with about 4000 groups, of which I would like to look at the top 100, the others should be defined as "Other". I would have 101 columns.
    The top 100 groups account for about 70% of the total.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Yeh, thats why I would propse to use the Remove Rare Values operator to replace all strings which are not in the top100 with "Other"?
  • Zarrok
    Zarrok New Altair Community Member
    I have found a solution, but it does not make me happy... I have created a aggregation(fractional) which I then join back to the table. Then I create a new attribute, which after the appropriate share either takes over the attribute or defines it as " Other ".