Good day.
I am new in using RM.
I need to remove duplicates from my dataset within preprocessing step.
SO,
I have 7621 examples as original set.
I used "remove duplicates' function of excel and got 6830 rows ( examples) as a result.
Since, I` m runing the project in RM , I need to clean my data via its operator. Thus, I used "Remove Duplicates operator" , I have choosen "Project name" attribute and run process. As an outcome I got 6854 examples.
My question is why do I have difference between the resulting examples ( 6854 via RM & 6830 via Excel).
I attached my process to this message and asking support for dealing with this problem, please.
Thank you in advance.