simple operator or method for combining nominal categories?
Telcontar120
New Altair Community Member
Is there some easy way to combine nominal categories together based on frequency? For example, if I have a nominal attribute with 10 different possible values, but I only want to keep the top 5 (by frequency) and then put the rest into an "Other" category.
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
Tagged:
0
Best Answer
-
Hi,
Replace Rare Values in Operator Toolbox is your friend
BR,
Martin2
Answers
-
Hi,
Replace Rare Values in Operator Toolbox is your friend
BR,
Martin2 -
Hi @Telcontar120
If I understood your problem well, I would do something like this:- Generate a new field containing the frequency, alongside your category.
- Generate a second field doing some discretization on the frequency, not the params.
- Generate a third field with some code: if(frequency > 50;[Category];"Other").
- Use the third field with the "combined" target.
All the best, sensei!
Rodrigo.1 -
Thanks guys for the fast replies. Both approaches would work, but @rfuentealba you should check out the single operator that @mschmitz mentions because that is exactly what I wanted!3
-
Awesome! I didn't know about it. Thank you both.
0 -
what did you guys search for in first place? The current pseudonyms (tags) for this operators are:
<tags>
Which is apprently not enough. Since I am the author of the operator i would love to know what we need to add so that it is easier to find.
<tag>Missing</tag>
<tag>Map</tag>
</tags>BR,Martin3 -
Great question @mschmitz ! I searched for "nominal" with various combinations of "categories values discretize map replace". It's probably my fault that I didn't think to search using only "replace" or "map" since I am aware of other RapidMiner operators with these names that are similar, but I was thinking they would require manual mapping which I wanted to avoid. I would say "nominal" is a key term because in this use case there are other similar operators (the "discretize" ones) that only work on numericals so I was trying to focus on those operators that would work with nominals. I realize your operator will work with any data type but with numericals I think you are much less likely to be searching for specific values to replace (since a continuous numerical attribute may have many individual values that are very infrequent).2
-
I'm not a native English speaker. I would have used "(discretize, summarize, replace, map, remap, regroup) weird values" (because "raro" is a common word in Spanish for both "weird" and "rare"). To be honest, I am not the kind of people who uses the search to discover new things because of the language gap.
On a slightly humorous note: Yes, I have to think before reacting when someone says I am "as rare as a Unicorn", because my first instinct usually tells me that I am "as weird as a Unicorn".3 -
Thanks guys, i'll add a bunch of these!1