🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

simple operator or method for combining nominal categories?

User: "Telcontar120"
New Altair Community Member
Updated by Jocelyn
Is there some easy way to combine nominal categories together based on frequency?  For example, if I have a nominal attribute with 10 different possible values, but I only want to keep the top 5 (by frequency) and then put the rest into an "Other" category.
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind.  Is there some operator for this that I am forgetting?  Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values. 
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!

Find more posts tagged with

Sort by:
1 - 8 of 81
    User: "MartinLiebig"
    Altair Employee
    Accepted Answer
    Hi,
    Replace Rare Values in Operator Toolbox is your friend :)

    BR,
    Martin
    User: "rfuentealba"
    New Altair Community Member
    Hi @Telcontar120

    If I understood your problem well, I would do something like this:
    • Generate a new field containing the frequency, alongside your category.
    • Generate a second field doing some discretization on the frequency, not the params.
    • Generate a third field with some code: if(frequency > 50;[Category];"Other").
    • Use the third field with the "combined" target.
    But now I'm wondering if there is anything I missed about the whole question, as my solution sounds too simplistic to me at least.

    All the best, sensei!

    Rodrigo.
    User: "Telcontar120"
    New Altair Community Member
    OP
    Thanks guys for the fast replies.  Both approaches would work, but @rfuentealba you should check out the single operator that @mschmitz mentions because that is exactly what I wanted!  
    User: "rfuentealba"
    New Altair Community Member
    Awesome! I didn't know about it. Thank you both.
    what did you guys search for in first place? The current pseudonyms (tags) for this operators are:
    <tags>
    <tag>Missing</tag>
    <tag>Map</tag>
    </tags>
    Which is apprently not enough. Since I am the author of the operator i would love to know what we need to add so that it is easier to find.

    BR,
    Martin
    User: "Telcontar120"
    New Altair Community Member
    OP
    Great question @mschmitz !  I searched for "nominal" with various combinations of "categories values discretize map replace".  It's probably my fault that I didn't think to search using only "replace" or "map" since I am aware of other RapidMiner operators with these names that are similar, but I was thinking they would require manual mapping which I wanted to avoid.  I would say "nominal" is a key term because in this use case there are other similar operators (the "discretize" ones) that only work on numericals so I was trying to focus on those operators that would work with nominals.  I realize your operator will work with any data type but with numericals I think you are much less likely to be searching for specific values to replace (since a continuous numerical attribute may have many individual values that are very infrequent).  
    User: "rfuentealba"
    New Altair Community Member
    I'm not a native English speaker. I would have used "(discretize, summarize, replace, map, remap, regroup) weird values" (because "raro" is a common word in Spanish for both "weird" and "rare"). To be honest, I am not the kind of people who uses the search to discover new things because of the language gap.

    On a slightly humorous note: Yes, I have to think before reacting when someone says I am "as rare as a Unicorn", because my first instinct usually tells me that I am "as weird as a Unicorn".
    Thanks guys, i'll add a bunch of these!