🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

how to keep a partly duplicated sample

User: "aileenzhou"
New Altair Community Member
Updated by Jocelyn
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: A>B>C, and delete rest.
For example, the data below, I want to keep row 1261 and 643, delete the rest. 

Row        DOI                      Source
18           10.1002/67           B
1261       10.1002/67           A
1400       10.1002/67           C
...
...
643        10.102/et.67         A
1428      10.102/et.67         C

Thank you in advance. 

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Hi @aileenzhou,

    If I good understand, one way to do that is to : 

     - Reorder attributes (1/ Source , 2/ DOI)
     - Generate concatenation (via Generate Aggregation attribute)
     - Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
     - Remove duplicates of this concatenated attribute.
     - Split back the concatenated attribute to retrieve the original attributes (without the duplicates)

    Take a look at the attached process and tell me if it answer to your need ...

    Regards,

    Lionel
    User: "aileenzhou"
    New Altair Community Member
    OP
    @lionelderkrikor Thank you sooooo much. What if source 'B' is preferable records to keep?