Text Clustering using K-Medoids Algorithm

puteri_prameswa
puteri_prameswa New Altair Community Member
edited November 2024 in Community Q&A

Hi All!

 

I'm new to RapidMiner. I have 1000+ online reviews generated from Tripadvisor.com. I want to apply K-Medoids algorithm to cluster those reviews into cluster. The reason why I chose K-Medoids bcs I want to find the "medoid" for each cluster, which I believe is able to represent the contents of the entire cluster. I already applied some nodes such as:

- Read Excel

- Select Attributes

- Nominal to Text

- Process Documents from Data (Tokenization, Stemming, Stopwords Removal)

- and the Clustering node itself

 

But I can't seem to find the proporsional cluster. From 1000+ data with k = 2, the ratio of of members of clusters 1 and 2 is 99 : 1. 

 

 

Please help mee!

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

     

    have you tried to use

     

    a) TF-IDF

    b) cosine similarity as distance measure

     

    Best,

    Martin

  • Telcontar120
    Telcontar120 New Altair Community Member

    I agree with @mschmitz suggestions.  However, there is no guarantee when using any of the k-means family of clustering algorithms that the clusters will be of equal sizes.  The algorithm isn't looking directly at the cluster sizes, but rather at intra-cluster similarity vs inter-cluster similarity.  You may want to try X-Means which will test a large range of possible k values and suggest the best one based on BIC.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.