"[SOLVED] k-Nearest Neighbor for a clustered search"

New Altair Community Member

Oct 8, 2013

Updated Nov 5, 2024 by Jocelyn

Hi,

I started using Rapid Miner a few weeks ago in order to complete my master thesis. Now I have a big problem, and I hope, anyone could help me.

First of all, the goal:
I am trying to solve the problem of retrieving documents that are similar to an input document. The goal is to return one document from a collection that closely matches the input document.

My Solution: I am creating word vectors of both the documents of the collection and of the input document. Afterwards I am clustering the collection documents via k-means in order to receive clusters and their centroids. To find documents which match the input document I want to compare the centroid vectors with the input document vector. Farther I just want to take those documents into account, which are included in the cluster with the most similar centroid vector. Then I want to determine the most similar document from that small selection.

My Problem: Via k-Nearest Neighbor-Algorithm I try to compare the input vector with the centroid vectors of the collection-clustering. But I don't know how to implement that properly in RapidMiner.
- how could I only use the centroid vectors as input in kNN?
- is there any possibility to receive the most similar cluster as output?
- is there any possibility to receive the most similar document as output?

A picture from the current process:
http://s7.directupload.net/file/d/3404/cuvjjqez_png.htm

Hope so much, anyone could help.
All the best!

Find more posts tagged with

AI Studio

Clustering

"[SOLVED] k-Nearest Neighbor for a clustered search"

Find more posts tagged with

Quick Links