Aggregate Duplicates
Can you suggest a method to remove duplicate examples and add a "count" attribute to the remaining unique items?
I would like to do that to reduce the size of the dataset and then use this counter attribute with a k-NN operator. Is that even possible in RM?
I would like to do that to reduce the size of the dataset and then use this counter attribute with a k-NN operator. Is that even possible in RM?
Sort by:
1 - 3 of
31
Thank you for your reply.
If I understand correctly, you suggest aggregating duplicates using the aggregate operator and "group by" all attributes.
How can this be utilized to make a k-NN faster?
Having 20 million samples with 20 attributes but only 1 million possible attribute combinations will result in a dataset of 1 million examples with 21 attributes.
How will k-NN work on that (ie use the 21st attribute as weight/count or something).
If I understand correctly, you suggest aggregating duplicates using the aggregate operator and "group by" all attributes.
How can this be utilized to make a k-NN faster?
Having 20 million samples with 20 attributes but only 1 million possible attribute combinations will result in a dataset of 1 million examples with 21 attributes.
How will k-NN work on that (ie use the 21st attribute as weight/count or something).
The aggregate operator is your friend - here's an example regards
Andrew