Text Clustering using K-Medoids Algorithm

puteri_prameswa · April 2017

Hi All!

I'm new to RapidMiner. I have 1000+ online reviews generated from Tripadvisor.com. I want to apply K-Medoids algorithm to cluster those reviews into k cluster. The reason why I chose K-Medoids bcs I want to find the "medoid" for each cluster, which I believe is able to represent the contents of the entire cluster. I already applied some nodes such as:

- Read Excel

- Select Attributes

- Nominal to Text

- Process Documents from Data (Tokenization, Stemming, Stopwords Removal)

- and the Clustering node itself

But I can't seem to find the proporsional cluster. From 1000+ data with k = 2, the ratio of of members of clusters 1 and 2 is 99 : 1.

Please help mee!

MartinLiebig · April 2017

Hi,

have you tried to use

a) TF-IDF

b) cosine similarity as distance measure

Best,

Martin

Telcontar120 · April 2017

I agree with @mschmitz suggestions. However, there is no guarantee when using any of the k-means family of clustering algorithms that the clusters will be of equal sizes. The algorithm isn't looking directly at the cluster sizes, but rather at intra-cluster similarity vs inter-cluster similarity. You may want to try X-Means which will test a large range of possible k values and suggest the best one based on BIC.

Text Clustering using K-Medoids Algorithm

Answers

Welcome!

Welcome!

Quick Links

Categories