How to reuse preprocessing results in a range of k-means clustering

User: "albertoarenal"
New Altair Community Member
Updated by Jocelyn

Hi all,

 

I am conducting a K-Means clustering analysis to several groups of documents and I would like to evaluate the clustering performance of different K ( K=4 to 20) by comparing their respective Davies-Bouldin indexes.

 

Previously to the clustering algorithm, I apply a preprocessing tasks (to transform cases, tokenize, filter stopwords, steeminng...creating a tf-if vector). The output of this preprocessing tasks is always the same for each group of texts (attached the general view of the process)

 

Now I am playing the process for each value of K, but I would like not to repeat this preprocessing tasks, which is the same for each group of text, every time I do the K clustering clustering and calculating davies-bouldin indexes, basically to save a lot of time 

 

Thank you very much in advance

Alberto

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer

    Just add a loop after the preprocessing steps to run k-means and save the output you want and then cycle through the different k-values you would like using a loop macro.

     

    An alternative would be to Store the results after pre-processing them and then create a separate process that starts by Retrieving that dataset before each run of the clustering (also within a loop).  Either approach should work.

    User: "nmahesh"
    New Altair Community Member
    Accepted Answer

    Hi Alberto,

     

    Have you tried using the store operator for the pre-processing? I would then create different processes to try out different parameter changes to your clustering and performance.

     

    Best,

    Nithin Mahesh