How to remove non-duplicate values?

MarlaBot
MarlaBot New Altair Community Member
edited November 5 in Community Q&A
A RapidMiner user wants to know the answer to this question: "Hey! I have a data set of over 42000 records that has several duplicate and unique values. However, I would like to clean it up and remove only non-duplicate values and leave duplicate records. I know the “remove duplicates” operator removes duplicates but in my case, I want to do the opposite. Any idea how I could do this? Thank you."

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    cant you just join the duplicates on the original data? Than you have only duplicates remaining.
    BR,
    Martin
  • sgenzer
    sgenzer
    Altair Employee
    hi @MarlaBot so the Remove Duplicates operator has both options:



    Does this help? :smile:

    Scott
  • rfuentealba
    rfuentealba New Altair Community Member
    Hey,

    You have 42000 records.

    Some are duplicate.
    Some are unique.

    If you need the non-uniques, the dup output from the Remove Duplicates operator obtains the records that aren't unique.

    Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes, @sgenzer's question is fine. If what is required is an aggregation (like, the count of duplicated events), what @mschmitz says helps, too.
  • novice_miner
    novice_miner New Altair Community Member
    Thanks for all your help. It worked like magic. 

    Best, 
  • Telcontar120
    Telcontar120 New Altair Community Member
    I think this is the same question as in this thread, where I provided a similar answer:  https://community.rapidminer.com/discussion/comment/57000#Comment_57000