"create exampleset with text plugin"

mw
mw New Altair Community Member
edited November 5 in Community Q&A
Hi all,

in Java code I would like to create an exampleset with the textplugin. I tried WVToolRapidMinerExample.java with my input and it works fine. I copied the exact code of this example to my method and I get an acces denied error. Debugging showed that WVToolRapidMinerExample.java treats my input as a directory containing traindocuments, as it should, but when I use my method  directories are treated as files which of course results in an exception.

Here is my code, RapidMiner is initialised when this code is reached;
private ExampleSet buildTrainExampleSetNieuw(Category category)
            throws OperatorCreationException, OperatorException {
        OperatorChain wvtoolOperator = (OperatorChain) OperatorService
                .createOperator("TextInput");
        wvtoolOperator.setParameter(TextInput.PARAMETER_DEFAULT_CONTENT_TYPE,
                "application/xml");
        wvtoolOperator.setParameter(
                TextInput.PARAMETER_DEFAULT_CONTENT_LANGUAGE, "dutch");
        wvtoolOperator.setParameter(
                TextInput.PARAMETER_DEFAULT_CONTENT_ENCODING, "iso-8859-1");
        wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_BELOW, "3");
        wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_ABOVE, "10");

        List<Object[]> textList = new LinkedList<Object[]>();

        textList
                .add(new Object[] { "Ambtenarenrecht",
                        "c:/workspace/documentclassification/trainset/Ambtenarenrecht/" });
        textList
                .add(new Object[] { "non-Ambtenarenrecht",
                        "c:/workspace/documentclassification/trainsetnon/Ambtenarenrecht/" });

        wvtoolOperator.addOperator(OperatorService
                .createOperator(SimpleTokenizer.class));

        wvtoolOperator.setListParameter("texts", textList);
        IOContainer out = wvtoolOperator.apply(new IOContainer());
        return out.get(ExampleSet.class);
    }
This is the code in WVToolRapidMinerExample.java that does a good job;
public static void main(String[] args) throws Exception {
        FileInputStream inputStream = new FileInputStream(
                "C:\\workspace\\textplugin\\resources\\operators.xml");
        RapidMiner.init(inputStream, new File("rm_plugins"), true, false,
                false, true);
        inputStream.close();
        OperatorChain wvtoolOperator = (OperatorChain) OperatorService
                .createOperator("TextInput");
        wvtoolOperator.setParameter(TextInput.PARAMETER_DEFAULT_CONTENT_TYPE,
                "application/xml");
        wvtoolOperator.setParameter(
                TextInput.PARAMETER_DEFAULT_CONTENT_LANGUAGE, "dutch");
        wvtoolOperator.setParameter(
                TextInput.PARAMETER_DEFAULT_CONTENT_ENCODING, "iso-8859-1");
        wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_BELOW, "3");
        wvtoolOperator.setParameter(TextInput.PARAMETER_PRUNE_ABOVE, "10");

        List<Object[]> textList = new LinkedList<Object[]>();

        // adjust data input
        textList
                .add(new Object[] { "Ambtenarenrecht",
                        "c:/workspace/documentclassification/trainset/Ambtenarenrecht/" });
        textList
                .add(new Object[] { "non-Ambtenarenrecht",
                        "c:/workspace/documentclassification/trainsetnon/Ambtenarenrecht/" });
        wvtoolOperator.addOperator(OperatorService
                .createOperator(SimpleTokenizer.class));

        wvtoolOperator.setListParameter("texts", textList);

        IOContainer out = wvtoolOperator.apply(new IOContainer());
        System.out.println("klaar");
    }
Any ideas on what I am doing wrong in  my code? I use Rapidminer/textplugin 4.2. Any suggestions that help solve this problem will be much appreciated.

Martine

Answers

  • land
    land New Altair Community Member
    Hi,
    to be honest, I'm really surprised that anybody still uses such an ancient version of RapidMiner :) 4.2 is outdated and no longer maintained since 2 years, so I can't really help you. Anyway I would suggest updating to 5.0 and the new Text Processing Extension, since the quality of code especially in the completely revised Text Extension is much better now.
    After doing this, the white paper "How to extend RapidMiner" from our shop might help to construct new example sets.

    Greetings,
      Sebastian