"Text Mining - Clustering Task - DISCOVER THE CONTENT OF EACH CLUSTER"

New Altair Community Member
Updated by Jocelyn
Hi,
My problem is Unsupervised Learning, because, as I said, my BOW has exactly 2290 attributes and 1572 examples. It does't has any label, just descriptors extracted to the texts and one attribute that is the name of the documents, which I put as a label.
I need to find the optimal number of clusters first. I did that model to discover it. I didn't know that the RapidMiner KMeans already had an implementation to the local minimum problem.
Opening a parenthesis about it, what kind of algorithm/theory do you use in this case? I only need put some reference in my thesis and explain it.
So, I leave the "ParameterIteration" to run over about an interval of desired clusters, and exclud the "RandomOptimizer" because it's not necessary. Do you has another suggetion?
Finally, I want measure the quality of my clusters. Using "ParameterIteration" I can generate scatter plot over "ClusterCentroidEvaluator" and I can see the relations about AVG and DB distances over each cluster. Do you has any other choice?
The problem, in this case, is because there are a lot of attributes, ie, a lot of descriptors.
I want to label or characterize each cluster.
I would be very grateful and happy for any help.
Marcello Sandi
My problem is Unsupervised Learning, because, as I said, my BOW has exactly 2290 attributes and 1572 examples. It does't has any label, just descriptors extracted to the texts and one attribute that is the name of the documents, which I put as a label.
I need to find the optimal number of clusters first. I did that model to discover it. I didn't know that the RapidMiner KMeans already had an implementation to the local minimum problem.
Opening a parenthesis about it, what kind of algorithm/theory do you use in this case? I only need put some reference in my thesis and explain it.
So, I leave the "ParameterIteration" to run over about an interval of desired clusters, and exclud the "RandomOptimizer" because it's not necessary. Do you has another suggetion?
Finally, I want measure the quality of my clusters. Using "ParameterIteration" I can generate scatter plot over "ClusterCentroidEvaluator" and I can see the relations about AVG and DB distances over each cluster. Do you has any other choice?
The problem, in this case, is because there are a lot of attributes, ie, a lot of descriptors.
I want to label or characterize each cluster.
I would be very grateful and happy for any help.
Marcello Sandi