Home
Discussions
Community Q&A
How to remove non-duplicate values?
MarlaBot
A RapidMiner user wants to know the answer to this question: "Hey! I have a data set of over 42000 records that has several duplicate and unique values. However, I would like to clean it up and remove only non-duplicate values and leave duplicate records. I know the “remove duplicates” operator removes duplicates but in my case, I want to do the opposite. Any idea how I could do this? Thank you."
Find more posts tagged with
AI Studio
Duplicates
Accepted answers
All comments
MartinLiebig
Hi,
cant you just join the duplicates on the original data? Than you have only duplicates remaining.
BR,
Martin
sgenzer
hi
@MarlaBot
so the Remove Duplicates operator has both options:
Does this help?
Scott
rfuentealba
Hey,
You have 42000 records.
Some are duplicate.
Some are unique.
If you need the non-uniques, the
dup
output from the
Remove Duplicates
operator obtains the records that aren't unique.
Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes,
@sgenzer
's question is fine. If what is required is an aggregation (like, the count of duplicated events), what
@mschmitz
says helps, too.
novice_miner
Thanks for all your help. It worked like magic.
Best,
Telcontar120
I think this is the same question as in this thread, where I provided a similar answer:
https://community.rapidminer.com/discussion/comment/57000#Comment_57000
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)