In topic "distance measures of text attributes" Neil McGuigan wrote:
... if you're trying to calculate the distance between terms, and not documents, then I would look into the Levenshtein Edit Distance, which I believe, is not (yet) implemented in RapidMiner.
|
The Levenshtein distance is included in an open source library I found on the net.
SimMetrics is an open source extensible library of Similarity or Distance Metrics, e.g. Levenshtein Distance, L2 Distance, Cosine Similarity, Jaccard Similarity etc etc. SimMetrics provides a library of float based similarity measures between String Data as well as the typical unnormalised metric output. It is intended for researchers in information integration, II, and other related fields. It includes a range of similarity measures from a variety of communities, including statistics, DNA analysis, artificial intelligence, information retrieval, and databases.
|
http://www.aktors.org/technologies/simmetrics/index.htmlSource code:
http://sourceforge.net/projects/simmetrics/Documentation:
http://www.coli.uni-saarland.de/courses/LT1/2011/slides/stringmetrics.pdfHow to install SimMetrics library on Microsoft SQL Server:
http://anastasiosyal.com/POST/2009/01/11/18.ASPX?Regards
Roland