"Agglomerative Clustering"

Question

Hi everyone,

i was trying to cluster sentences using Rapidminer's Kmeans algorithm, that was alright. But now my question is, is it possible to give a precomputed sentence to sentence similarity matrix to the Agglomerative clustering? I want to calculate the similarities between sentence using the Wordnet similarity functions and then I want to build the clusters using Agglomerative clustering.

Thanks a lot!

Best Regards,
Cris

dynera · Answer

Hi Christina,

Would you consider posting the steps you used to extend the agglomerative cluster with the distance measure?

Thanks,

Paul

land · Answer

Hi Cristina,
the RapidMiner 4.6 is available in our SVN repository on SourceForge, otherwise you could download the source code at sf.net directly.

Converting to RapidMiner 5.0 is always worth the effort, but if you are on time pressure, I wouldn't do it if not necessary. It's a little bit time consuming. 
But I'm very interested in your work and the results. In which language do you write your thesis?

Greetings,
  Sebastian

mscrissy · Answer

Hi Sebastian!

Thanks a lot for your answer.

Actually I'm writing my master thesis and I use rapidminer for clustering semantically similar sentences. I finally managed to add my own similarity measure and use it with Agglomerative clustering. And it seems that it works correctly, but I still have to do some more tests. So what I have achieved is for example if I have the following two sentences : "Supervise client in CV writing" and "Help client writing a curriculum vitae" will be in the same cluster. The similarity matrix contains the similarity scores among all sentence and the similarity score is calculated based on the similarity measures from Wordnet. I'm still using the RM 4.4, because i could not find the RM 4.6 version. Could you please tell me from where could I check it out, or would it be better to use RM 5.0? Is the text mining plug-in working there?

Best Regards,
Cristina