How to remove non-duplicate values?
MarlaBot
New Altair Community Member
A RapidMiner user wants to know the answer to this question: "Hey! I have a data set of over 42000 records that has several duplicate and unique values. However, I would like to clean it up and remove only non-duplicate values and leave duplicate records. I know the “remove duplicates” operator removes duplicates but in my case, I want to do the opposite. Any idea how I could do this? Thank you."
Tagged:
0
Answers
-
Hi,cant you just join the duplicates on the original data? Than you have only duplicates remaining.BR,Martin1
-
Hey,
You have 42000 records.
Some are duplicate.
Some are unique.
If you need the non-uniques, the dup output from the Remove Duplicates operator obtains the records that aren't unique.
Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes, @sgenzer's question is fine. If what is required is an aggregation (like, the count of duplicated events), what @mschmitz says helps, too.3 -
Thanks for all your help. It worked like magic.
Best,3 -
I think this is the same question as in this thread, where I provided a similar answer: https://community.rapidminer.com/discussion/comment/57000#Comment_570001