Hi all,
I am conducting a K-Means clustering analysis to several groups of documents and I would like to evaluate the clustering performance of different K ( K=4 to 20) by comparing their respective Davies-Bouldin indexes.
Previously to the clustering algorithm, I apply a preprocessing tasks (to transform cases, tokenize, filter stopwords, steeminng...creating a tf-if vector). The output of this preprocessing tasks is always the same for each group of texts (attached the general view of the process)
Now I am playing the process for each value of K, but I would like not to repeat this preprocessing tasks, which is the same for each group of text, every time I do the K clustering clustering and calculating davies-bouldin indexes, basically to save a lot of time
Thank you very much in advance
Alberto