"Text Mining - Clustering Task - DISCOVER THE CONTENT OF EACH CLUSTER"

User: "Marcello_Sandi"
New Altair Community Member
Updated by Jocelyn
Hi,

My problem is Unsupervised Learning, because, as I said, my BOW has exactly 2290 attributes and 1572 examples. It does't has any label, just descriptors extracted to the texts and one attribute that is the name of the documents, which I put as a label.

I need to find the optimal number of clusters first. I did that model to discover it. I didn't know that the RapidMiner KMeans already had an implementation to the local minimum problem.

Opening a parenthesis about it, what kind of algorithm/theory do you use in this case? I only need put some reference in my thesis and explain it.

So, I leave the "ParameterIteration" to run over about an interval of desired clusters, and exclud the "RandomOptimizer" because it's not necessary. Do you has another suggetion?

Finally, I want measure the quality of my clusters. Using "ParameterIteration" I can generate scatter plot over "ClusterCentroidEvaluator" and I can see the relations about AVG and DB distances over each cluster. Do you has any other choice?

The problem, in this case, is because there are a lot of attributes, ie, a lot of descriptors.

I want to label or characterize each cluster.

I would be very grateful and happy for any help.

Marcello Sandi

Find more posts tagged with