Hello community: I have a problem withe duplicate data, this is an example: Col 1 Col 2 39-2021 49-2021 49-2021 39-2021 so I want to remove one of those rows. Help please, you are my only hope.....

Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

How to remove duplicate data

Hello community: I have a problem withe duplicate data, this is an example:

Col 1 Col 2

39-2021	49-2021
49-2021	39-2021

so I want to remove one of those rows.

Help please, you are my only hope.....

Find more posts tagged with

AI Studio

Similarity

Results View

Duplicates

Accepted answers

All comments

BalazsBaranyRM

Hi @jmphillips,

this sounds like a good use case for De-Pivot, removing the duplicates, and Pivot again.

De-Pivot will put all values into subsequent rows and keep the information about the column name (Col 1, Col 2) in another column. You would then remove the duplicates according to your rules and use Pivot to build the original data structure again.

Be careful to remove entire "rows" (original rows, which are multiple rows after the De-Pivot). If you don't do that, you will have missing data upon pivoting again.

Regards,
Balázs

jmphillips

Thanks, Balazs, but I dont know how to use those operators....

jmphillips

FIRST_ID	SECOND_ID	SIMILARITY
24260-2016	24413-2016	1.0
24413-2016	24260-2016	1.0***
22055-2016	22056-2016	1.0
22056-2016	22055-2016	1.0
22057-2016	22058-2016	1.0
22057-2016	22060-2016	1.0
22057-2016	22059-2016	1.0
22058-2016	22057-2016	1.0
22058-2016	22060-2016	1.0
22058-2016	22059-2016	1.0
22060-2016	22057-2016	1.0
22060-2016	22058-2016	1.0
22060-2016	22059-2016	1.0
22059-2016	22057-2016	1.0
22059-2016	22058-2016	1.0
22059-2016	22060-2016	1.0

If I de pivot all SECOND_ID are goit to a new collum, with the FIRST_ID data, so if I then remove duplicates Im going to loose data, the problem si that I want to eliminate for example *** 24413-2016 24260-2016 because I have 24260-2016 24413-2016 the first and the second Id in the case are the same but inverted.

BalazsBaranyRM

Hi,

another approach would be duplicating the data with Multiply and then joining with Left Outer join on FIRST_ID = SECOND_ID *and* SECOND_ID = FIRST_ID. You could then decide which matches to remove.

Regards,
Balázs

jmphillips

Balazs, sorry dont know how to do it... can you sent me the box and parameter to use? Please.

Regards.

frederic_gomer_

HI,

jmphillips You can try this process

Image: https://us.v-cdn.net/6030995/uploads/editor/66/t2frnzotkal1.jpg

Attach file : How_to_remove_duplicate_data.rmp

How_to_remove_duplicate_data.rmp

jmphillips

Thanks Frederic for the answer but, row one is asociated at first and second columm the second row is the same values but inverted, so I want to remove the complete second row. Regards. FIRST_ID	SECOND_ID	SIMILARITY
24260-2016	24413-2016	1.0
24413-2016	24260-2016	1.0