how to keep a partly duplicated sample
aileenzhou
New Altair Community Member
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: A>B>C, and delete rest.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 B
1261 10.1002/67 A
1400 10.1002/67 C
...
...
643 10.102/et.67 A
1428 10.102/et.67 C
Thank you in advance.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 B
1261 10.1002/67 A
1400 10.1002/67 C
...
...
643 10.102/et.67 A
1428 10.102/et.67 C
Thank you in advance.
Tagged:
0
Best Answer
-
Hi @aileenzhou,
If I good understand, one way to do that is to :
- Reorder attributes (1/ Source , 2/ DOI)
- Generate concatenation (via Generate Aggregation attribute)
- Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
- Remove duplicates of this concatenated attribute.
- Split back the concatenated attribute to retrieve the original attributes (without the duplicates)
Take a look at the attached process and tell me if it answer to your need ...
Regards,
Lionel5
Answers
-
Hi @aileenzhou,
If I good understand, one way to do that is to :
- Reorder attributes (1/ Source , 2/ DOI)
- Generate concatenation (via Generate Aggregation attribute)
- Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
- Remove duplicates of this concatenated attribute.
- Split back the concatenated attribute to retrieve the original attributes (without the duplicates)
Take a look at the attached process and tell me if it answer to your need ...
Regards,
Lionel5 -
@lionelderkrikor Thank you sooooo much. What if source 'B' is preferable records to keep?0