"Agglomerative Clustering"

mscrissy
mscrissy New Altair Community Member
edited November 5 in Community Q&A
Hi everyone,

i was trying to cluster sentences using Rapidminer's Kmeans algorithm, that was alright. But now my question is, is it possible to give a precomputed sentence to sentence similarity matrix to the Agglomerative clustering? I want to calculate the similarities between sentence using the Wordnet similarity functions and then I want to build the clusters using Agglomerative clustering.

Thanks a lot!

Best Regards,
Cris

Answers

  • land
    land New Altair Community Member
    Hi Cris,
    I don't see any possibility, how you could assign new examples to an agglomerative clustering after it has been built the first time? How would you do that?

    Greetings,
      Sebastian
  • mscrissy
    mscrissy New Altair Community Member
    Hi Sebastian,

    Instead of using cosine similarity for calculating the distances between sentences I would like to use a similarity measure from wordnet. Therefore I thought for a realtively small dataset i would precompute the similaity matrice between the sentences and I would like to use this matix for Agglomerative clustering.  Sorry if I'm talking nonsense.

    i would appreciate any suggestions on how to achieve this.

    Thanks a lot!

    Cheers,
    Cris
  • land
    land New Altair Community Member
    Hi Cris,
    now it's perfectly clear to me :) And you have a valid point here, but we have thought of that and have built in a way for creating your own similarity measure.
    There is a class called com.rapidminer.tools.math.similarity.DistanceMeasures. It offers a method for adding your custom distance measure implementation to all operators providing the possibility to select a distance measure. You can call this method in a convenient way by creating your own Extension. How this works and where there is a hook for adding the measure during initialization can be found in the Tutorial How to Extend RapidMiner 5.0 in our shop.
    And by the way: I personally would be very interested to see such an extension. Are you going to publish it?

    Greetings,
      Sebastian
  • mscrissy
    mscrissy New Altair Community Member
    Hi Sebastian!

    Thanks a lot for your answer.

    Actually I'm writing my master thesis and I use rapidminer for clustering semantically similar sentences. I finally managed to add my own similarity measure and use it with Agglomerative clustering. And it seems that it works correctly, but I still have to do some more tests. So what I have achieved is for example if I have the following two sentences : "Supervise client in CV writing" and "Help client writing a curriculum vitae" will be in the same cluster. The similarity matrix contains the similarity scores among all sentence and the similarity score is calculated based on the similarity measures from Wordnet. I'm still using the RM 4.4, because i could not find the RM 4.6 version. Could you please tell me from where could I check it out, or would it be better to use RM 5.0? Is the text mining plug-in working there?

    Best Regards,
    Cristina
  • land
    land New Altair Community Member
    Hi Cristina,
    the RapidMiner 4.6 is available in our SVN repository on SourceForge, otherwise you could download the source code at sf.net directly.

    Converting to RapidMiner 5.0 is always worth the effort, but if you are on time pressure, I wouldn't do it if not necessary. It's a little bit time consuming.
    But I'm very interested in your work and the results. In which language do you write your thesis?

    Greetings,
      Sebastian
  • dynera
    dynera New Altair Community Member
    Hi Christina,

    Would you consider posting the steps you used to extend the agglomerative cluster with the distance measure?

    Thanks,

    Paul