Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Finding the most similar document(s) in a collection to a test document

While I was using version 4 of Rapid Miner I built a chain to perform this function. It is discussed here:
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.

With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.

The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).

Any recommendations?

Thank you.

Charles

Find more posts tagged with

AI Studio

Accepted answers

All comments

radone

Hello Charles,

I was not deal with any similar problem, but my idea is to use entropy based representation (available in text mining extension) of documents and than for example usink k-NN you can check the similarity of the documents.

Regards
radone