Identify Duplicate examples

aliasgarscool
aliasgarscool New Altair Community Member
edited November 2024 in Community Q&A

Hi,

 

I've a data in which I want to identify duplicates (unlike remove duplicate i want duplicate fields)

 

For example I've below data 

Month                Name                         Amount

Jul-15                John                           10$

Aug-15              Alex                            15$

Sep-15             John                             5$

Jul-15                John                           10$

 

 

if the above table is my input then i want only below in my results

Month                Name                         Amount

Jul-15                John                           10$

Jul-15                John                           10$

 

Tagged:

Best Answer

  • dr-connie-brett
    dr-connie-brett New Altair Community Member
    Answer ✓

    If you don't actually need the duplicated examples, but rather need the count of how many times they appear this is how I would handle it:

    1 - aggregate the table  (Aggregate operator - group by all attributes and count on one of them)

    2 - filter examples for all count(attribute) > 1

    Screen Shot 2016-09-25 at 9.59.00 AM.png

    I'm assuming since there is no unique identifier you are ignoring you don't really need the duplicates the number of times they appear, but it might be useful to know how many times they appear!

     

Answers

  • sgenzer
    sgenzer
    Altair Employee

    hi...that was a good puzzle.  I would do it this way:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_id" compatibility="7.2.002" expanded="true" height="82" name="Generate ID" width="90" x="179" y="136"/>
    <operator activated="true" class="multiply" compatibility="7.2.002" expanded="true" height="103" name="Multiply" width="90" x="313" y="136"/>
    <operator activated="true" class="remove_duplicates" compatibility="7.2.002" expanded="true" height="82" name="Remove Duplicates" width="90" x="514" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Amount|Month|Name"/>
    </operator>
    <operator activated="true" class="set_minus" compatibility="7.2.002" expanded="true" height="82" name="Set Minus" width="90" x="715" y="136"/>
    <connect from_port="input 1" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Remove Duplicates" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Set Minus" to_port="example set input"/>
    <connect from_op="Remove Duplicates" from_port="example set output" to_op="Set Minus" to_port="subtrahend"/>
    <connect from_op="Set Minus" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

  • dr-connie-brett
    dr-connie-brett New Altair Community Member
    Answer ✓

    If you don't actually need the duplicated examples, but rather need the count of how many times they appear this is how I would handle it:

    1 - aggregate the table  (Aggregate operator - group by all attributes and count on one of them)

    2 - filter examples for all count(attribute) > 1

    Screen Shot 2016-09-25 at 9.59.00 AM.png

    I'm assuming since there is no unique identifier you are ignoring you don't really need the duplicates the number of times they appear, but it might be useful to know how many times they appear!