Community & Support
Learn
Marketplace
Discussions
Categories
Discussions
General
Platform
Academic
Partner
Regional
Explore Siemens Communities
User Groups
Documentation
Events
Altair Exchange
Share or Download Projects
Resources
News & Instructions
Programs
YouTube
Employee Resources
This tab can be seen by employees only. Please do not share these resources externally.
Groups
Join a User Group
Support
Home
Discussions
Community Q&A
simple operator or method for combining nominal categories?
Telcontar120
Is there some easy way to combine nominal categories together based on frequency? For example, if I have a nominal attribute with 10 different possible values, but I only want to keep the top 5 (by frequency) and then put the rest into an "Other" category.
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
Find more posts tagged with
AI Studio
ETL + Data Prep
Accepted answers
MartinLiebig
Hi,
Replace Rare Values in Operator Toolbox is your friend
BR,
Martin
All comments
MartinLiebig
Hi,
Replace Rare Values in Operator Toolbox is your friend
BR,
Martin
rfuentealba
Hi
@Telcontar120
If I understood your problem well, I would do something like this:
Generate a new field containing the frequency, alongside your category.
Generate a second field doing some discretization on the
frequency
, not the params.
Generate a third field with some code:
if(frequency > 50;[Category];"Other")
.
Use the third field with the "combined" target.
But now I'm wondering if there is anything I missed about the whole question, as my solution sounds too simplistic to me at least.
All the best, sensei!
Rodrigo.
Telcontar120
Thanks guys for the fast replies. Both approaches would work, but
@rfuentealba
you should check out the single operator that
@mschmitz
mentions because that is exactly what I wanted!
rfuentealba
Awesome! I didn't know about it. Thank you both.
MartinLiebig
Hi
@Telcontar120
,
@rfuentealba
,
what did you guys search for in first place? The current pseudonyms (tags) for this operators are:
<tags>
<tag>Missing</tag>
<tag>Map</tag>
</tags
>
Which is apprently not enough. Since I am the author of the operator i would love to know what we need to add so that it is easier to find.
BR,
Martin
Telcontar120
Great question
@mschmitz
! I searched for "nominal" with various combinations of "categories values discretize map replace". It's probably my fault that I didn't think to search using only "replace" or "map" since I am aware of other RapidMiner operators with these names that are similar, but I was thinking they would require manual mapping which I wanted to avoid. I would say "nominal" is a key term because in this use case there are other similar operators (the "discretize" ones) that only work on numericals so I was trying to focus on those operators that would work with nominals. I realize your operator will work with any data type but with numericals I think you are much less likely to be searching for specific values to replace (since a continuous numerical attribute may have many individual values that are very infrequent).
rfuentealba
I'm not a native English speaker. I would have used "(discretize, summarize, replace, map, remap, regroup) weird values" (because "raro" is a common word in Spanish for both "weird" and "rare"). To be honest, I am not the kind of people who uses the search to discover new things because of the language gap.
On a slightly humorous note: Yes, I have to think before reacting when someone says I am "as rare as a Unicorn", because my first instinct usually tells me that I am "as weird as a Unicorn".
MartinLiebig
Thanks guys, i'll add a bunch of these!
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups