Finding the most similar document(s) in a collection to a test document
crcowan
New Altair Community Member
While I was using version 4 of Rapid Miner I built a chain to perform this function. It is discussed here:
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.
With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.
The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).
Any recommendations?
Thank you.
Charles
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.
With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.
The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).
Any recommendations?
Thank you.
Charles
Tagged:
0
Answers
-
Hello Charles,
I was not deal with any similar problem, but my idea is to use entropy based representation (available in text mining extension) of documents and than for example usink k-NN you can check the similarity of the documents.
Regards
radone0