🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Permutations with string distance for near duplicates detection

User: "tib"
New Altair Community Member
Updated by Jocelyn
Dear Rapid-Miners,

A RapidMiner newbie needs your help :-)

We are trying to use RapidMiner to analyse a huge dataset stored in an Oracle Database. The data represents organisation (companies) information such as addresses, emails, description, market sector...
The idea would be to compute the similitude (using string distance functions ?) of each of these companies with each other (permutations ?)... The goal being to find the near duplicates in the database.

Would RapidMiner be able to achieve such task ? If yes, how should I procede ? Any help would be really appreciated :)

Thanks a lot

Thibault

Find more posts tagged with