"create exampleset with text plugin"
mw
New Altair Community Member
Hi all,
in Java code I would like to create an exampleset with the textplugin. I tried WVToolRapidMinerExample.java with my input and it works fine. I copied the exact code of this example to my method and I get an acces denied error. Debugging showed that WVToolRapidMinerExample.java treats my input as a directory containing traindocuments, as it should, but when I use my method directories are treated as files which of course results in an exception.
Here is my code, RapidMiner is initialised when this code is reached;
Martine
in Java code I would like to create an exampleset with the textplugin. I tried WVToolRapidMinerExample.java with my input and it works fine. I copied the exact code of this example to my method and I get an acces denied error. Debugging showed that WVToolRapidMinerExample.java treats my input as a directory containing traindocuments, as it should, but when I use my method directories are treated as files which of course results in an exception.
Here is my code, RapidMiner is initialised when this code is reached;
private ExampleSet buildTrainExampleSetNieuw(Category category)This is the code in WVToolRapidMinerExample.java that does a good job;
throws OperatorCreationException, OperatorException {
OperatorChain wvtoolOperator = (OperatorChain) OperatorService
.createOperator("TextInput");
wvtoolOperator.setParameter(TextInput.PARAMETER_DEFAULT_CONTENT_TYPE,
"application/xml");
wvtoolOperator.setParameter(
TextInput.PARAMETER_DEFAULT_CONTENT_LANGUAGE, "dutch");
wvtoolOperator.setParameter(
TextInput.PARAMETER_DEFAULT_CONTENT_ENCODING, "iso-8859-1");
wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_BELOW, "3");
wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_ABOVE, "10");
List<Object[]> textList = new LinkedList<Object[]>();
textList
.add(new Object[] { "Ambtenarenrecht",
"c:/workspace/documentclassification/trainset/Ambtenarenrecht/" });
textList
.add(new Object[] { "non-Ambtenarenrecht",
"c:/workspace/documentclassification/trainsetnon/Ambtenarenrecht/" });
wvtoolOperator.addOperator(OperatorService
.createOperator(SimpleTokenizer.class));
wvtoolOperator.setListParameter("texts", textList);
IOContainer out = wvtoolOperator.apply(new IOContainer());
return out.get(ExampleSet.class);
}
public static void main(String[] args) throws Exception {Any ideas on what I am doing wrong in my code? I use Rapidminer/textplugin 4.2. Any suggestions that help solve this problem will be much appreciated.
FileInputStream inputStream = new FileInputStream(
"C:\\workspace\\textplugin\\resources\\operators.xml");
RapidMiner.init(inputStream, new File("rm_plugins"), true, false,
false, true);
inputStream.close();
OperatorChain wvtoolOperator = (OperatorChain) OperatorService
.createOperator("TextInput");
wvtoolOperator.setParameter(TextInput.PARAMETER_DEFAULT_CONTENT_TYPE,
"application/xml");
wvtoolOperator.setParameter(
TextInput.PARAMETER_DEFAULT_CONTENT_LANGUAGE, "dutch");
wvtoolOperator.setParameter(
TextInput.PARAMETER_DEFAULT_CONTENT_ENCODING, "iso-8859-1");
wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_BELOW, "3");
wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_ABOVE, "10");
List<Object[]> textList = new LinkedList<Object[]>();
// adjust data input
textList
.add(new Object[] { "Ambtenarenrecht",
"c:/workspace/documentclassification/trainset/Ambtenarenrecht/" });
textList
.add(new Object[] { "non-Ambtenarenrecht",
"c:/workspace/documentclassification/trainsetnon/Ambtenarenrecht/" });
wvtoolOperator.addOperator(OperatorService
.createOperator(SimpleTokenizer.class));
wvtoolOperator.setListParameter("texts", textList);
IOContainer out = wvtoolOperator.apply(new IOContainer());
System.out.println("klaar");
}
Martine
Tagged:
0
Answers
-
Hi,
to be honest, I'm really surprised that anybody still uses such an ancient version of RapidMiner 4.2 is outdated and no longer maintained since 2 years, so I can't really help you. Anyway I would suggest updating to 5.0 and the new Text Processing Extension, since the quality of code especially in the completely revised Text Extension is much better now.
After doing this, the white paper "How to extend RapidMiner" from our shop might help to construct new example sets.
Greetings,
Sebastian0