"Rocchio Algorithm"

dali
dali New Altair Community Member
edited November 5 in Community Q&A
Hi,

is there an implementation of the rocchio algorithm in RapidMiner? Or how could I change the k-Nearest-Neighbor to a Rocchio by calculating the average word vector for each class and use only these for classification.

THX in advance.

Answers

  • dali
    dali New Altair Community Member
    Hello again,

    it's pretty sad, that there is no Rocchio in RapidMiner. Now I'm trying to set up my own but already having problems while trying to get the mean of all word vectors of a class.

    Is there a function that averages all given word vectors so I get one centroid vector? I can't find it.

    Thanks for any help.
  • land
    land New Altair Community Member
    Hi,
    this is possible if you somehow missuse K-Medoids. See the following process for details:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
        <process expanded="true" height="190" width="614">
          <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="75">
            <parameter key="name" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="k_medoids" compatibility="5.1.001" expanded="true" height="76" name="Clustering" width="90" x="514" y="75">
            <parameter key="k" value="3"/>
            <parameter key="measure_types" value="NominalMeasures"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Unfortunately this won't work in the current version because of a bug in the nominal Distance measure using the numerical attributes, too. This is resolved with the coming update at end of next week.

    Greetings,
      Sebastian
  • dali
    dali New Altair Community Member
    Thanx for the reply. I'm really looking forward to try it by the end of the week. I'll tell, if it worked.
  • dali
    dali New Altair Community Member
    well, it looked like a good idea to "misuse K-Medoids" but it's taking hours to calculate - I stopped it after half an hour. I think the problem is, that RM is trying to find my classes, but using the given classes might help speeding up the whole process.

    isn't there another operator to just calculate the mean of some wordvectors? there must be anything like averaging all given vectors and getting the mean vector?! just can't find it.

    thanks a lot for any advice.
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    I have just uploaded a process which calculates the average values for all attributes grouped by the class and uses the resulting prototypes as input for the k-NN learner. It might be that you need a recent RapidMiner version since this process makes use of a relatively new feature of the operator "Aggregate", namely to directly aggregate a set of attributes with the same default function. Otherwise you will have to define all aggegations for all attributes manually which is of course not really possible for word vectors...

    The description of the process on myExperiment can be found at

    http://www.myexperiment.org/workflows/1917.html

    You can directly download the process from myExperiment within RapidMiner (which I strongly recommend) by using the Community Extension of RapidMiner. Just install the extension and activate the "MyExperiment Browser" view. Then you can easily search for processes and download them. The process is called "Rocchio".

    Cheers,
    Ingo
  • land
    land New Altair Community Member
    Hi,
    let me mention that this is possible only with the 5.1.002+ version released a week before.

    Some problems become outdated really fast...

    Greetings,
      Sebastian