How to remove non-duplicate values?
A RapidMiner user wants to know the answer to this question: "Hey! I have a data set of over 42000 records that has several duplicate and unique values. However, I would like to clean it up and remove only non-duplicate values and leave duplicate records. I know the “remove duplicates” operator removes duplicates but in my case, I want to do the opposite. Any idea how I could do this? Thank you."
Find more posts tagged with
Sort by:
1 - 5 of
51

MartinLiebig

Hi,
cant you just join the duplicates on the original data? Than you have only duplicates remaining.
BR,
Martin
Hey,
You have 42000 records.
Some are duplicate.
Some are unique.
If you need the non-uniques, the dup output from the Remove Duplicates operator obtains the records that aren't unique.
Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes, @sgenzer's question is fine. If what is required is an aggregation (like, the count of duplicated events), what @mschmitz says helps, too.
You have 42000 records.
Some are duplicate.
Some are unique.
If you need the non-uniques, the dup output from the Remove Duplicates operator obtains the records that aren't unique.
Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes, @sgenzer's question is fine. If what is required is an aggregation (like, the count of duplicated events), what @mschmitz says helps, too.
I think this is the same question as in this thread, where I provided a similar answer: https://community.rapidminer.com/discussion/comment/57000#Comment_57000