Keep samples based on prefered attribute value

aileenzhou
New Altair Community Member
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: B>C>A, and delete rest.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 A
1261 10.1002/67 B
1400 10.1002/67 C
... ...
643 10.102/et.67 C
1428 10.102/et.67 A
Thank you in advance.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 A
1261 10.1002/67 B
1400 10.1002/67 C
... ...
643 10.102/et.67 C
1428 10.102/et.67 A
Thank you in advance.
Tagged:
0
Best Answer
-
Hi @aileenzhou,
In this case, (B>C>A) :
Then use the same method as in the other thread, but by generating a new attribute called "Source_2" as described :
- Reorder attributes (1/ Source_2 , 2/ DOI)
- Generate a new attribute (for example called "Source_2") and replace in this new attribute :
*B by 1
*C by 1
*A by 2
- Generate concatenation of "Source_2" and "DOI" attributes (via Generate Aggregation attribute)
- Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
- Remove duplicates of this concatenated attribute.
- Split back the concatenated attribute to retrieve the original attributes (without the duplicates) or remove them.
Take a look at the attached process and tell me if it answer to your need ...
Regards,
Lionel
1
Answers
-
Since Remove Duplicate always keeps the first you can I think sort and then use remove duplicates on the DOI.Best,Martin0
-
Hi @aileenzhou,
In this case, (B>C>A) :
Then use the same method as in the other thread, but by generating a new attribute called "Source_2" as described :
- Reorder attributes (1/ Source_2 , 2/ DOI)
- Generate a new attribute (for example called "Source_2") and replace in this new attribute :
*B by 1
*C by 1
*A by 2
- Generate concatenation of "Source_2" and "DOI" attributes (via Generate Aggregation attribute)
- Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
- Remove duplicates of this concatenated attribute.
- Split back the concatenated attribute to retrieve the original attributes (without the duplicates) or remove them.
Take a look at the attached process and tell me if it answer to your need ...
Regards,
Lionel
1