Finding the most similar document(s) in a collection to a test document

crcowan
crcowan New Altair Community Member
edited November 5 in Community Q&A
While I was using version 4 of Rapid Miner I built a chain to perform this function. It is discussed here:
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.

With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.

The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).

Any recommendations?

Thank you.

      Charles
Tagged:

Answers

  • radone
    radone New Altair Community Member
    Hello Charles,

    I was not deal with any similar problem, but my idea is to use entropy based representation (available in text mining extension) of documents and than for example usink k-NN you can check the similarity of the documents.

    Regards
    radone