Using SOM as a clustering Operator

siamak_want
siamak_want New Altair Community Member
edited November 5 in Community Q&A
Hi all,

Nowadays,  I have found SOM network (Self-Organizing Map) very efficient for text clustering (according to several valid publications), So I'm about to use it for document clustering with RM. but suddenly I found a strange fact:
                                      "SOM operator has been considered as a visualization operator in RM, not as a clustering operator"
Now, wthe question is that: Can I utilize the current visualization SOM algorithm for developing my own clustering SOM operartor? (I have bought the mannual and so I am familiar with extending rapidminer 5.0)

Anay idea would be greatly appreciated.

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Of course you could do that, but did you see that there is a Self-Organizing Map operator? Maybe it does exactly what you are planning to implement.

    Best, Marius
  • siamak_want
    siamak_want New Altair Community Member
    Hi Marius,
    Thanks to your straightforward guide, but the Self-Organizing Map (which you have addressed) does not deliver a cluster model. any idea about this?

    thanks
  • MariusHelf
    MariusHelf New Altair Community Member
    You can either apply a clustering algorithm on the SOMified data, or you could set the dimensionality of the SOM to 1 and the net size to the desired number of clusters. The SOM operator outputs a preprocessing model, which you can then apply on new data. See the attached process for an easy example.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="500" width="950">
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="246" y="75"/>
          <operator activated="true" class="self_organizing_map" compatibility="5.2.003" expanded="true" height="94" name="SOM" width="90" x="447" y="75">
            <parameter key="number_of_dimensions" value="1"/>
          </operator>
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data (2)" width="90" x="447" y="210"/>
          <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="648" y="120">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="SOM" to_port="example set input"/>
          <connect from_op="SOM" from_port="example set output" to_port="result 1"/>
          <connect from_op="SOM" from_port="preprocessing model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • siamak_want
    siamak_want New Altair Community Member
    Thanks to your fantastic and also tricky method, Marius.
    I think you solved the problem. Now I can use the "preprocessing model", exactly as a "cluster model".

    thanks again to Marius.
  • MariusHelf
    MariusHelf New Altair Community Member
    Yeah, sometimes RapidMiner is not just about data mining, but also about creativity ;)