"Text Clustering Example"

Legacy User
Legacy User New Altair Community Member
edited November 5 in Community Q&A
Folks,

All of my queries on text clustering are occurring because this "text clustering example", outputs "no results produced".  The newsgroup data is present in the directories noted. 

Why does it not generate the output identified in the description?

<operator name="Root" class="Process" expanded="yes">
      <description text="#ylt#h3#ygt#Clustering text documents#ylt#/h3#ygt##ylt#p#ygt#In this experiment, texts from two newsgroups are read and clustered. To make the clusters better comprehensible, three keywords are extracted for each cluster and added to the cluster description.#ylt#/p#ygt#"/>
      <parameter key="logverbosity" value="status"/>
      <operator name="TextInput" class="TextInput" expanded="yes">
          <parameter key="default_content_language" value="english"/>
          <list key="namespaces">
          </list>
          <parameter key="prune_above" value="10"/>
          <parameter key="prune_below" value="5"/>
          <list key="texts">
            <parameter key="graphics" value="../data/newsgroup/graphics"/>
            <parameter key="hardware" value="../data/newsgroup/hardware"/>
          </list>
          <operator name="StringTokenizer" class="StringTokenizer">
          </operator>
          <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
          </operator>
          <operator name="TokenLengthFilter" class="TokenLengthFilter">
              <parameter key="min_chars" value="5"/>
          </operator>
          <operator name="PorterStemmer" class="PorterStemmer">
          </operator>
      </operator>
      <operator name="KMeans" class="KMeans">
      </operator>
      <operator name="AttributeSumClusterCharacterizer" class="AttributeSumClusterCharacterizer">
      </operator>
  </operator>

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hello,

    this sample (from a plugin (!)) was not updated to the fact that the automatic cluster characterization was removed some time ago. I can hardly believe that this process has worked at all (did you really run it on a fresh RM 4.4 installation?) since I would think that the operator "AttributeSumClusterCharacterizer" is deprecated if not even removed - but I can be mistaken.

    Before you ask: the characterization took a long a time even if you were not interested in it and worked not well enough. Much better characterizations can be found with the approaches I sketched in the other thread and hence it was removed.

    Cheers,
    Ingo