Similarity between mutiple tables
Hi,
Currently, I am working on a thesis research for my university to solve an entity resolution problem. Today I have tried to integrate two tables with each other through measuring the Similarity between these tables. If the threshold is above 0,9 it is considered as useful and will it be used in the second evaluation. In the second evaluation the variables will be evaluated on weight. For example, a phone number is a better unique key, than a firstname. At the end, the customer representation need to be evaluated as followed (0.9*2)+(0,8*7) = .... if the threshold is above the 0.8 (for example) it will consider as usefull and integrate the rows. I Tried to perform the similarity (with a couple of similarity measures) measure In rapid miner, but I received extreme values ( <0 or >1).
(currently, I cannot post any screenshots, since I am new)
What do I wrong?
Cheers, Robin