"ArrayIndexOutOfBoundsException when loading pdf files"

behrangsa
behrangsa New Altair Community Member
edited November 5 in Community Q&A
Hi,

When I load PDF files in my process I get the following exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at com.rapidminer.operator.TermWeightClusterCharacterizer.apply(Unknown Source)
        at com.rapidminer.operator.Operator.apply(Operator.java:664)
        at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
        at com.rapidminer.operator.Operator.apply(Operator.java:664)
        at com.rapidminer.Process.run(Process.java:612)
        at com.rapidminer.Process.run(Process.java:582)
        at com.rapidminer.Process.run(Process.java:572)
        at org.behrang.clustering.Main.createProcess(Main.java:77)
        at org.behrang.clustering.Main.main(Main.java:26)
Here's my process:

System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
        RapidMiner.init();
       
        Process p = new Process();
       
        OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
        textInput.setParameter(PARAMETER_DEFAULT_CONTENT_LANGUAGE, "english");
        textInput.setParameter(PARAMETER_PRUNE_ABOVE, "15");
        textInput.setParameter(PARAMETER_PRUNE_BELOW, "5");
        // textInput.setParameter(PARAMETER_DEFAULT_CONTENT_TYPE, "pdf");
       
        List<Object[]> textList = new LinkedList<Object[]>();
        for (File f : new File("fit4005").listFiles()) {
            textList.add(new Object[] {
              f.getAbsolutePath(),
              f.getAbsolutePath()
            });
        }
//        for (File f : new File("newsgroup/graphics").listFiles()) {
//            textList.add(new Object[] {
//              f.getAbsolutePath(),
//              f.getAbsolutePath()
//            });
//        }
//        for (File f : new File("newsgroup/hardware").listFiles()) {
//            textList.add(new Object[] {
//              f.getAbsolutePath(),
//              f.getAbsolutePath()
//            });
//        }
        // textList.add(new Object[] {"graphics","newsgroup/graphics"});
        // textList.add(new Object[] {"hardware","newsgroup/hardware"});       
        textInput.setListParameter("texts", textList);
        textInput.addOperator(OperatorService.createOperator("StringTokenizer"));
        textInput.addOperator(OperatorService.createOperator("EnglishStopwordFilter"));
       
        Operator tlfOperator = OperatorService.createOperator("TokenLengthFilter");
        tlfOperator.setParameter("min_chars", "5");
        textInput.addOperator(tlfOperator);
        textInput.addOperator(OperatorService.createOperator("PorterStemmer"));
       
        p.getRootOperator().addOperator(textInput);
        p.getRootOperator().addOperator(OperatorService.createOperator("KMeans"));
        p.getRootOperator().addOperator(OperatorService.createOperator("AttributeSumClusterCharacterizer"));

        p.save(new File("Process.xml"));
       
        IOContainer io = p.run();
        SimpleExampleSet ses = (SimpleExampleSet) io.get(SimpleExampleSet.class);
        System.out.println(ses.getExample(0));       
        System.exit(0);
fit4005
contains the PDF files. If I load text files everything works fine. Any ideas why is this happening and how can I fix it?

Thanks in advance,
Behi
Tagged:

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    sorry, but I do not have a direct solution. But I would suggest that you setup the process in the GUI first and use the possibility for breakpoints etc. in order to trace down the problem. If everything works fine in the GUI, you can then simply use

    Process process = new Process(xmlFile);
    or

    Process process = new Process(xmlString);
    and

    process.run();
    in order to deploy the process. It is usually much easier to get things right with the GUI mode before you include  the complete process into your own application.

    Cheers,
    Ingo