[SOLVED] Remove Duplicates Weird Problem !
Hi ,
I had used the remove duplicate operator on a dataset with 5 million records on my laptop and on a powerful linux system , it'd run in 7 seconds on both systems,
Now I have a dataset with 8 million records (same number of attributes) , it runs on my laptop in 10 seconds ! but it doesn't finish running on the same remote server ! it took 8 hours but it didn't finish !
It's very weird !!! It's a very simple operation !
How could something like this happen ? Is there anyway to solve it ?
Thanks,
Arian
I had used the remove duplicate operator on a dataset with 5 million records on my laptop and on a powerful linux system , it'd run in 7 seconds on both systems,
Now I have a dataset with 8 million records (same number of attributes) , it runs on my laptop in 10 seconds ! but it doesn't finish running on the same remote server ! it took 8 hours but it didn't finish !
It's very weird !!! It's a very simple operation !
How could something like this happen ? Is there anyway to solve it ?
Thanks,
Arian
Tagged:
0
Answers
-
Hi Arian,
maybe the server ran out of memory? Do you use the same memory settings on both machines?
Best regards,
Marius0 -
Actually It was very wired , I was doing some missing value replacing and now I changed the way I was doing it , it now works (the result of both missing value replacements seems to be the same ! but remove duplication doesn't work on one of them)
Thanks by the way ,
-Arian0