🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Text Clustering"

User: "Legacy User"
New Altair Community Member
Updated by Jocelyn
Ingo - I've taken this as far as I can and now I'm stuck!  I've created the following experiment that attempts to cluster text extracted from a sample Excel file containing 14 examples, 0 special attributes and 8 regular attributes.  Here's the syntax so far ...

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExcelExampleSource" class="ExcelExampleSource">
        <parameter key="datamanagement" value="long_array"/>
        <parameter key="excel_file" value="C:\feedback.xls"/>
        <parameter key="first_row_as_names" value="true"/>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="parameter_string" value="comments"/>
    </operator>
    <operator name="Nominal2String" class="Nominal2String">
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="yes">
        <parameter key="default_content_language" value="english"/>
        <parameter key="vector_creation" value="TermOccurrences"/>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars" value="3"/>
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="target_function" value="gaussian mixture clusters"/>
    </operator>
    <operator name="KMeans" class="KMeans">
        <parameter key="k" value="3"/>
    </operator>
</operator>

This process produce 3 clusters.  Cluster 0 has 33 items, Cluster 1 has 55 items, and Cluster 3 has 12 on a total of 100 examples.  At this point, I want to apply a meaningful, user-friendly label to each cluster that captures the key theme of each cluster.  How can I figure out the key theme for each cluster?  What steps are next?

Please help!

Find more posts tagged with