The role of dimensionality reduction with regard to Clustering approaches
Muhammed_Fatih_
New Altair Community Member
Hello Community,
I plan to evaluate several Clustering techniques on a TF-IDF bag of words representation where I've previously executed a feature selection to efficiently reduce the number of dimensions of my vector space. In this sense, I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterwards. First of all, how do you see this opinition out of theory?
Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection? Or should the Feature Exraction for efficient Clustering should be applied on the initial rough dataset?
I thank you all for the participation and for the answers!
Best regards!
I plan to evaluate several Clustering techniques on a TF-IDF bag of words representation where I've previously executed a feature selection to efficiently reduce the number of dimensions of my vector space. In this sense, I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterwards. First of all, how do you see this opinition out of theory?
Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection? Or should the Feature Exraction for efficient Clustering should be applied on the initial rough dataset?
I thank you all for the participation and for the answers!
Best regards!
Tagged:
1
Best Answer
-
Hi,another way which i really like is the combination of PCA and K-Means. This makes a lot of sense in many scenarios, because both algorithms have similar assumptions (euclidan distances and variances are often the same concept). Afterwards you can use a technique like this: https://towardsdatascience.com/understanding-clustering-cf0117148ef4 to understand what is going on.Cheers,Martin
2
Answers
-
I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterward
Based on your question, I assume that you are talking about techniques like PCA, ICA or some other things related to your data (n-grams etc). One of the major drawback with dimensionality reduction like PCA is the loss of interpretability. If you want to explain/interpret then feature selection is the way as it preserves original features. If your focus is to do dimensionality reduction then feature extraction can be done. You can use it where interpretation is not highly important.
I think both (extract/selection) of them seem similar but they have a different purpose. I am not sure if it is always correct to say that feature extraction works better than selection.Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection?Yes, you can do both. I generally apply feature extraction first and then doing a feature selection. There is nothing wrong as far as I know.1 -
Hi,Be careful with feature selection for clustering though: If you simply optimize for things like DB-Index without multi-objective optimization you will end up with trivial solutions where the data space collapses and the clusters no longer have any meaning. I recommend to check out some of the papers I wrote about this ages ago. There are still relevant though. I am sure you can find them online somewhere:Mierswa, Ingo and Wurst, Michael. Information Preserving Multi-Objective Feature Selection for Unsupervised Learning. In Maarten Keijzer and Mike Cattolico and Dirk Arnold and Vladan Babovic and Christian Blum and Peter Bosman and Martin V. Butz and Carlos Coello Coello and Dipankar Dasgupta and Sevan G. Ficici and James Foster and Arturo Hernandez-Aguirre and Greg Hornby and Hod Lipson and Phil McMinn and Jason Moore and Guenther Raidl and Franz Rothlauf and Conor Ryan and Dirk Thierens (editors), GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1545--1552, New York, NY, USA, ACM Press, 2006.Or you just can go with the full PhD which covers a lot of related topics, too:There is a PDF of it as well...Cheers,
Ingo3 -
Hi @IngoRM,
thank you for the literature recommendation!
However, you wrote that one should be careful when using Feature Selection and Clustering. But do you have other alternatives with regard to efficient dimensionality reduction and subsequent Clustering if you want to interprete the Clustering results afterwards as @varunm1 mentioned? I don't see any other way beside Topic Modeling approaches like LDA.
Thank you in advance for your answer!
1 -
Hi,another way which i really like is the combination of PCA and K-Means. This makes a lot of sense in many scenarios, because both algorithms have similar assumptions (euclidan distances and variances are often the same concept). Afterwards you can use a technique like this: https://towardsdatascience.com/understanding-clustering-cf0117148ef4 to understand what is going on.Cheers,Martin
2 -
Hi @mschmitz,
interesting approach. So you start clustering based on the PCA values and try to give a sense to the detected clusters afterwards by using the Decision Tree, right ?
Best regards!1 -
Hi,pretty much yes. the trick is that you can do the interpretation on the original feature space, not the PCA-ed one.Best,Martin2
-
Hi,Martin's approach works.But do you have other alternatives with regard to efficient dimensionality reduction and subsequent Clustering if you want to interprete the Clustering resultsThe other alternative is to use multi-objective optimization for feature selection in the original space. HOWEVER, you need to maximize the number of features, not minimize it. More details can be found in the paper I have mentioned above.Cheers,
Ingo2