"Rocchio Algorithm"
dali
New Altair Community Member
Hi,
is there an implementation of the rocchio algorithm in RapidMiner? Or how could I change the k-Nearest-Neighbor to a Rocchio by calculating the average word vector for each class and use only these for classification.
THX in advance.
is there an implementation of the rocchio algorithm in RapidMiner? Or how could I change the k-Nearest-Neighbor to a Rocchio by calculating the average word vector for each class and use only these for classification.
THX in advance.
Tagged:
0
Answers
-
Hello again,
it's pretty sad, that there is no Rocchio in RapidMiner. Now I'm trying to set up my own but already having problems while trying to get the mean of all word vectors of a class.
Is there a function that averages all given word vectors so I get one centroid vector? I can't find it.
Thanks for any help.0 -
Hi,
this is possible if you somehow missuse K-Medoids. See the following process for details:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Unfortunately this won't work in the current version because of a bug in the nominal Distance measure using the numerical attributes, too. This is resolved with the coming update at end of next week.
<process version="5.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
<process expanded="true" height="190" width="614">
<operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="75">
<parameter key="name" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="k_medoids" compatibility="5.1.001" expanded="true" height="76" name="Clustering" width="90" x="514" y="75">
<parameter key="k" value="3"/>
<parameter key="measure_types" value="NominalMeasures"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Greetings,
Sebastian0 -
Thanx for the reply. I'm really looking forward to try it by the end of the week. I'll tell, if it worked.0
-
well, it looked like a good idea to "misuse K-Medoids" but it's taking hours to calculate - I stopped it after half an hour. I think the problem is, that RM is trying to find my classes, but using the given classes might help speeding up the whole process.
isn't there another operator to just calculate the mean of some wordvectors? there must be anything like averaging all given vectors and getting the mean vector?! just can't find it.
thanks a lot for any advice.0 -
Hi,
I have just uploaded a process which calculates the average values for all attributes grouped by the class and uses the resulting prototypes as input for the k-NN learner. It might be that you need a recent RapidMiner version since this process makes use of a relatively new feature of the operator "Aggregate", namely to directly aggregate a set of attributes with the same default function. Otherwise you will have to define all aggegations for all attributes manually which is of course not really possible for word vectors...
The description of the process on myExperiment can be found at
http://www.myexperiment.org/workflows/1917.html
You can directly download the process from myExperiment within RapidMiner (which I strongly recommend) by using the Community Extension of RapidMiner. Just install the extension and activate the "MyExperiment Browser" view. Then you can easily search for processes and download them. The process is called "Rocchio".
Cheers,
Ingo0 -
Hi,
let me mention that this is possible only with the 5.1.002+ version released a week before.
Some problems become outdated really fast...
Greetings,
Sebastian0