Keep samples based on prefered attribute value

aileenzhou
aileenzhou New Altair Community Member
edited November 2024 in Community Q&A
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: B>C>A, and delete rest.

For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row     DOI                 Source
18        10.1002/67       A
1261    10.1002/67       B
1400    10.1002/67       C
... ...
643      10.102/et.67    C 
1428    10.102/et.67    A 

Thank you in advance.
Tagged:

Best Answer

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @aileenzhou,

    In this case, (B>C>A) :  

    Then use the same method as in the other thread, but by generating a new attribute called "Source_2" as described  : 

     - Reorder attributes (1/ Source_2 , 2/ DOI)
     - Generate a new attribute (for example called "Source_2") and replace in this new attribute : 
         *B by 1
         *C by 1
         *A by 2
     - Generate concatenation of "Source_2" and "DOI" attributes (via Generate Aggregation attribute)
     - Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
     - Remove duplicates of this concatenated attribute.
     - Split back the concatenated attribute to retrieve the original attributes (without the duplicates) or remove them. 

    Take a look at the attached process and tell me if it answer to your need ...

    Regards,

    Lionel


Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Since Remove Duplicate always keeps the first you can I think sort and then use remove duplicates on the DOI.

    Best,
    Martin
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @aileenzhou,

    In this case, (B>C>A) :  

    Then use the same method as in the other thread, but by generating a new attribute called "Source_2" as described  : 

     - Reorder attributes (1/ Source_2 , 2/ DOI)
     - Generate a new attribute (for example called "Source_2") and replace in this new attribute : 
         *B by 1
         *C by 1
         *A by 2
     - Generate concatenation of "Source_2" and "DOI" attributes (via Generate Aggregation attribute)
     - Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
     - Remove duplicates of this concatenated attribute.
     - Split back the concatenated attribute to retrieve the original attributes (without the duplicates) or remove them. 

    Take a look at the attached process and tell me if it answer to your need ...

    Regards,

    Lionel


Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.