ID column in K-means clustering

molsen · November 2016

Hello,

I am doing a text clustering on text by using K-means and the output goes to an Excel file.

All is working fine, but I can't seem to the original ID column into the new spreadsheet?

Instead a new column is created with ascending numbers.

This ID column: id column.JPG Original example set Into this column:

This is my workflow:

molsen · December 2016

I found a way to pass on the ID column on from the "Process Documents from Data" operator to the "K-means clustering" operator.

It turned out that the only thing I had missed was a small checkmark called "Add meta information":

add meta info.JPG

After that I got the ID data all the way through to the Excel fil at the end!

Thomas_Ott · November 2016

Use a Set Role operator to set your ID column to the ID role. Then it should pass through to the clusters.

molsen · December 2016

Thank you for the reply, like this?

set role.JPG

Because it seems like the Id gets lost in the Process Documents, anything I have to do there?

process docs.JPG

Thomas_Ott · December 2016

Double check your process, I'm able to pass an attribute that's set with an ID role through Process Documents from Data.

Which comes from this sumple Search Twitter Process.

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
        <parameter key="connection" value="Twitter Connection"/>
        <parameter key="query" value="rapidminer"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Id|Text"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.3.000" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.2.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="514" y="34">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.2.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

molsen · December 2016

Danm, I'm not able to make it work Mr. T-Bone!

Can you share your workflow?

Or do you know of a guide that shows how to pass the ID through?

I have made mine based on this tutorial:

k-means clustering tutorial

molsen · December 2016

I found a way to pass on the ID column on from the "Process Documents from Data" operator to the "K-means clustering" operator.

It turned out that the only thing I had missed was a small checkmark called "Add meta information":

add meta info.JPG

After that I got the ID data all the way through to the Excel fil at the end!

ID column in K-means clustering

Best Answer

Answers

Categories