One-Hot Encoding Top 10 Items (Fractional) Rest Other
Zarrok
New Altair Community Member
Hello together,
i am searching for a smart solution for One-Hot Encoding to the Top 10 (Fractional) Items.
Currently I solve the problem by creating a new attribute for the top 10 values. For example:
For each Attribute I need to generate a new Column:
if((contains([Attri],"Example Data")) ,1,0)
Does anybody have a smart solution for this kind of issue ?
Kind regards,
ZaRRoK
i am searching for a smart solution for One-Hot Encoding to the Top 10 (Fractional) Items.
Currently I solve the problem by creating a new attribute for the top 10 values. For example:
For each Attribute I need to generate a new Column:
if((contains([Attri],"Example Data")) ,1,0)
Does anybody have a smart solution for this kind of issue ?
Kind regards,
ZaRRoK
0
Answers
-
Hi,likely just use Remove Rare Values first and then One Hot Encoding?BR,Martin-1
-
I understand what you mean, problem is rather that I have a large dataset with about 4000 groups, of which I would like to look at the top 100, the others should be defined as "Other". I would have 101 columns.
The top 100 groups account for about 70% of the total.
0 -
Yeh, thats why I would propse to use the Remove Rare Values operator to replace all strings which are not in the top100 with "Other"?
1 -
I have found a solution, but it does not make me happy... I have created a aggregation(fractional) which I then join back to the table. Then I create a new attribute, which after the appropriate share either takes over the attribute or defines it as " Other ".
0