🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"ArrayIndexOutOfBoundsException when loading pdf files"

User: "behrangsa"
New Altair Community Member
Updated by Jocelyn
Hi,

When I load PDF files in my process I get the following exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at com.rapidminer.operator.TermWeightClusterCharacterizer.apply(Unknown Source)
        at com.rapidminer.operator.Operator.apply(Operator.java:664)
        at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
        at com.rapidminer.operator.Operator.apply(Operator.java:664)
        at com.rapidminer.Process.run(Process.java:612)
        at com.rapidminer.Process.run(Process.java:582)
        at com.rapidminer.Process.run(Process.java:572)
        at org.behrang.clustering.Main.createProcess(Main.java:77)
        at org.behrang.clustering.Main.main(Main.java:26)
Here's my process:

System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
        RapidMiner.init();
       
        Process p = new Process();
       
        OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
        textInput.setParameter(PARAMETER_DEFAULT_CONTENT_LANGUAGE, "english");
        textInput.setParameter(PARAMETER_PRUNE_ABOVE, "15");
        textInput.setParameter(PARAMETER_PRUNE_BELOW, "5");
        // textInput.setParameter(PARAMETER_DEFAULT_CONTENT_TYPE, "pdf");
       
        List<Object[]> textList = new LinkedList<Object[]>();
        for (File f : new File("fit4005").listFiles()) {
            textList.add(new Object[] {
              f.getAbsolutePath(),
              f.getAbsolutePath()
            });
        }
//        for (File f : new File("newsgroup/graphics").listFiles()) {
//            textList.add(new Object[] {
//              f.getAbsolutePath(),
//              f.getAbsolutePath()
//            });
//        }
//        for (File f : new File("newsgroup/hardware").listFiles()) {
//            textList.add(new Object[] {
//              f.getAbsolutePath(),
//              f.getAbsolutePath()
//            });
//        }
        // textList.add(new Object[] {"graphics","newsgroup/graphics"});
        // textList.add(new Object[] {"hardware","newsgroup/hardware"});       
        textInput.setListParameter("texts", textList);
        textInput.addOperator(OperatorService.createOperator("StringTokenizer"));
        textInput.addOperator(OperatorService.createOperator("EnglishStopwordFilter"));
       
        Operator tlfOperator = OperatorService.createOperator("TokenLengthFilter");
        tlfOperator.setParameter("min_chars", "5");
        textInput.addOperator(tlfOperator);
        textInput.addOperator(OperatorService.createOperator("PorterStemmer"));
       
        p.getRootOperator().addOperator(textInput);
        p.getRootOperator().addOperator(OperatorService.createOperator("KMeans"));
        p.getRootOperator().addOperator(OperatorService.createOperator("AttributeSumClusterCharacterizer"));

        p.save(new File("Process.xml"));
       
        IOContainer io = p.run();
        SimpleExampleSet ses = (SimpleExampleSet) io.get(SimpleExampleSet.class);
        System.out.println(ses.getExample(0));       
        System.exit(0);
fit4005
contains the PDF files. If I load text files everything works fine. Any ideas why is this happening and how can I fix it?

Thanks in advance,
Behi

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "IngoRM"
    New Altair Community Member
    Hi,

    sorry, but I do not have a direct solution. But I would suggest that you setup the process in the GUI first and use the possibility for breakpoints etc. in order to trace down the problem. If everything works fine in the GUI, you can then simply use

    Process process = new Process(xmlFile);
    or

    Process process = new Process(xmlString);
    and

    process.run();
    in order to deploy the process. It is usually much easier to get things right with the GUI mode before you include  the complete process into your own application.

    Cheers,
    Ingo