🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to reuse preprocessing results in a range of k-means clustering

User: "albertoarenal"
New Altair Community Member
Updated by Jocelyn

Hi all,

 

I am conducting a K-Means clustering analysis to several groups of documents and I would like to evaluate the clustering performance of different K ( K=4 to 20) by comparing their respective Davies-Bouldin indexes.

 

Previously to the clustering algorithm, I apply a preprocessing tasks (to transform cases, tokenize, filter stopwords, steeminng...creating a tf-if vector). The output of this preprocessing tasks is always the same for each group of texts (attached the general view of the process)

 

Now I am playing the process for each value of K, but I would like not to repeat this preprocessing tasks, which is the same for each group of text, every time I do the K clustering clustering and calculating davies-bouldin indexes, basically to save a lot of time 

 

Thank you very much in advance

Alberto

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer

    Just add a loop after the preprocessing steps to run k-means and save the output you want and then cycle through the different k-values you would like using a loop macro.

     

    An alternative would be to Store the results after pre-processing them and then create a separate process that starts by Retrieving that dataset before each run of the clustering (also within a loop).  Either approach should work.

    User: "nmahesh"
    New Altair Community Member
    Accepted Answer

    Hi Alberto,

     

    Have you tried using the store operator for the pre-processing? I would then create different processes to try out different parameter changes to your clustering and performance.

     

    Best,

    Nithin Mahesh