ID column in K-means clustering

molsen
molsen New Altair Community Member
edited November 5 in Community Q&A

Hello,

I am doing a text clustering on text by using K-means and the output goes to an Excel file.

All is working fine, but I can't seem to the original ID column into the new spreadsheet?

Instead a new column is created with ascending numbers.

 

This ID column: id column.JPGOriginal example set   Into this column:   excel.JPG

 

This is my workflow:

workflow.JPG

Best Answer

  • molsen
    molsen New Altair Community Member
    Answer ✓

    I found a way to pass on the ID column on from the "Process Documents from Data" operator to the "K-means clustering" operator.

    It turned out that the only thing I had missed was a small checkmark called "Add meta information":

    add meta info.JPG

    After that I got the ID data all the way through to the Excel fil at the end!

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Use a Set Role operator to set your ID column to the ID role. Then it should pass through to the clusters.

  • molsen
    molsen New Altair Community Member

    Thank you for the reply, like this?

    set role.JPG

     

    Because it seems like the Id gets lost in the Process Documents, anything I have to do there?

    process docs.JPG

     

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Double check your process, I'm able to pass an attribute that's set with an ID role through Process Documents from Data.

    ID.png

     

    Which comes from this sumple Search Twitter Process.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="Twitter Connection"/>
    <parameter key="query" value="rapidminer"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Id|Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="7.3.000" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.2.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="514" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.2.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • molsen
    molsen New Altair Community Member

    Danm, I'm not able to make it work Mr. T-Bone!

    Can you share your workflow?

    Or do you know of a guide that shows how to pass the ID through?

    I have made mine based on this tutorial:

    k-means clustering tutorial

  • molsen
    molsen New Altair Community Member
    Answer ✓

    I found a way to pass on the ID column on from the "Process Documents from Data" operator to the "K-means clustering" operator.

    It turned out that the only thing I had missed was a small checkmark called "Add meta information":

    add meta info.JPG

    After that I got the ID data all the way through to the Excel fil at the end!