WordNet in RM 5

simon_knoll · June 2010

Hello all,
short question: in RM 4.x there was this WordNetSynonymStemmer. is this operator gone in ver. 5 and one has to use groovy scripting instead?

thx
simon knoll

Wanttoknow · June 2010

Hi,

I was asking myself the same thing: Where is the Wordnet stemmer in RM5?

TobiasMalbrecht · June 2010

Hi,

I think the WordNet stemmer was removed since it did not work that well. Eventually, we try to re-animate it somewhen, but that is only speculation.

Kind regards,
Tobias

simon_knoll · September 2010

hi,
i coded myself a wordnet operator, if someone is interested i can share code snippets.
what i can say is that for my testing dataset i've got some good results by adding hyponyms for kmeans clustering.

all the best,
simon

B_ · September 2010

Simon

would appreciate seeing how you set this up.
thanks

b.

simon_knoll · September 2010

hi,
1st, you'll have to install wordnet
2nd, you need a java wordnet api, i took this one http://projects.csail.mit.edu/jwi/ (not for commercial purposes, but the fastest i know)
3rd, you'll have to implement an Operator (i added a new Class in the "com.rapidminer.operator.text.io.wordfilter" package)
for this i just copied an operator of the text plugin, deleted all the things i do not need and added the code for wordnet (here i add hypernyms)

i hope this was more helpful than confusing

package com.rapidminer.operator.text.io.wordfilter;

import java.io.File;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import com.rapidminer.operator.OperatorDescription;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.operator.text.Document;
import com.rapidminer.operator.text.Token;
import com.rapidminer.operator.text.io.AbstractTokenProcessor;
import com.rapidminer.parameter.UndefinedParameterError;

import edu.mit.jwi.Dictionary;
import edu.mit.jwi.IDictionary;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.item.ISynset;
import edu.mit.jwi.item.ISynsetID;
import edu.mit.jwi.item.IWord;
import edu.mit.jwi.item.IWordID;
import edu.mit.jwi.item.POS;
import edu.mit.jwi.item.Pointer;
import edu.mit.jwi.morph.WordnetStemmer;

public class WordnetHyponymOperator extends AbstractTokenProcessor {
	private WordnetStemmer stemmer;
	private IDictionary dict;

	public WordnetHyponymOperator(OperatorDescription description) {
		super(description);
		String wnhome = "/usr/local/WordNet-3.0/";
		String path = wnhome + File.separator + "dict";
		URL url = null;
		try {
			url = new URL("file", null, path);
		} catch (MalformedURLException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

		// construct the dictionary object and open it
		IDictionary dict = new Dictionary(url);
		dict.open();
		WordnetStemmer stemmer = new WordnetStemmer(dict);
		this.dict = dict;
		this.stemmer = stemmer;
	}

@Override
	protected Document doWork(Document textObject) throws OperatorException {

		List<Token> newSequence = new ArrayList<Token>(textObject
				.getTokenSequence().size());
		for (Token token : textObject.getTokenSequence()) {
			List<String> stems = stemmer.findStems(token.getToken(), POS.NOUN);
			if (stems != null && stems.size() > 0) {
				String word2 = stems.get(0);
				IIndexWord idxWord = dict.getIndexWord(word2, POS.NOUN);
				if (idxWord != null && idxWord.getWordIDs().size() > 0) {
					if (idxWord != null && idxWord.getWordIDs().size() > 0) {
						IWordID wordID = idxWord.getWordIDs().get(0);
						IWord word = dict.getWord(wordID);
						ISynset synset = word.getSynset();
						List<ISynsetID> blub = synset.getRelatedMap().get(
								Pointer.HYPERNYM);

						for (ISynsetID iSynsetID : blub) {
							ISynset set = dict.getSynset(iSynsetID);
							List<IWord> bla = set.getWords();
							for (IWord iWord : bla) {
								newSequence.add(new Token(iWord.getLemma(),
										token.getWeight()));
							}

						}
					}
				}
			}
			newSequence.add(token);
		}
		textObject.setTokenSequence(newSequence);
		return textObject;
	}

}

TobiasMalbrecht · September 2010

Hi Simon,

thank you very much for sharing your work. At the moment, our work at the text processing extension is almost idle because of other work. But maybe we have a look at it sometime ...?!

Best regards,
Tobias

simon_knoll · September 2010

Yes, would be cool if this kind of features would be added again to the text plugin.

B_ · September 2010

thanks for the example Simon

WordNet in RM 5

Answers

Categories