"Top Down Clustering - determining Item Number of lower Level Clusters"

tiramisusann
tiramisusann New Altair Community Member
edited November 5 in Community Q&A

I'm using Rapid Miner in order to complete the task of my Master Thesis. For that I have to cluster a huge amount of textual data with the goal to identify the most similar document of the database to an incoming piece of document.

For that I need to define a top down clustering. In the lowest level it should contain clusters with only ONE document (otherwise it would be not possible to find the most similar document). The incoming document should follow the path which it is most similar to by comparing the centroid vectors of the clusters with the document vector. Applying that algorithm it will terminate at the cluster containing the most similar document.

But how could I implement that idea in Rapid Miner? I have no clue how to tell Rapid Miner, that Clusters of the last and lowest level only should contain one single document.

I would be very very grateful, if anyone could help.

Thanks so much,

tiramisusann

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    The Top Down Clustering operator with k-Means in the subprocess does this job for you. This probably also restrains you from implementing the solution to your previous question :)

    Happy Mining!
    ~Marius