"ArrayIndexOutOfBoundsException when loading pdf files"
behrangsa
New Altair Community Member
Hi,
When I load PDF files in my process I get the following exception:
Thanks in advance,
Behi
When I load PDF files in my process I get the following exception:
Here's my process:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at com.rapidminer.operator.TermWeightClusterCharacterizer.apply(Unknown Source)
at com.rapidminer.operator.Operator.apply(Operator.java:664)
at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
at com.rapidminer.operator.Operator.apply(Operator.java:664)
at com.rapidminer.Process.run(Process.java:612)
at com.rapidminer.Process.run(Process.java:582)
at com.rapidminer.Process.run(Process.java:572)
at org.behrang.clustering.Main.createProcess(Main.java:77)
at org.behrang.clustering.Main.main(Main.java:26)
System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
RapidMiner.init();
Process p = new Process();
OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
textInput.setParameter(PARAMETER_DEFAULT_CONTENT_LANGUAGE, "english");
textInput.setParameter(PARAMETER_PRUNE_ABOVE, "15");
textInput.setParameter(PARAMETER_PRUNE_BELOW, "5");
// textInput.setParameter(PARAMETER_DEFAULT_CONTENT_TYPE, "pdf");
List<Object[]> textList = new LinkedList<Object[]>();
for (File f : new File("fit4005").listFiles()) {
textList.add(new Object[] {
f.getAbsolutePath(),
f.getAbsolutePath()
});
}
// for (File f : new File("newsgroup/graphics").listFiles()) {
// textList.add(new Object[] {
// f.getAbsolutePath(),
// f.getAbsolutePath()
// });
// }
// for (File f : new File("newsgroup/hardware").listFiles()) {
// textList.add(new Object[] {
// f.getAbsolutePath(),
// f.getAbsolutePath()
// });
// }
// textList.add(new Object[] {"graphics","newsgroup/graphics"});
// textList.add(new Object[] {"hardware","newsgroup/hardware"});
textInput.setListParameter("texts", textList);
textInput.addOperator(OperatorService.createOperator("StringTokenizer"));
textInput.addOperator(OperatorService.createOperator("EnglishStopwordFilter"));
Operator tlfOperator = OperatorService.createOperator("TokenLengthFilter");
tlfOperator.setParameter("min_chars", "5");
textInput.addOperator(tlfOperator);
textInput.addOperator(OperatorService.createOperator("PorterStemmer"));
p.getRootOperator().addOperator(textInput);
p.getRootOperator().addOperator(OperatorService.createOperator("KMeans"));
p.getRootOperator().addOperator(OperatorService.createOperator("AttributeSumClusterCharacterizer"));
p.save(new File("Process.xml"));
IOContainer io = p.run();
SimpleExampleSet ses = (SimpleExampleSet) io.get(SimpleExampleSet.class);
System.out.println(ses.getExample(0));
System.exit(0);
fit4005contains the PDF files. If I load text files everything works fine. Any ideas why is this happening and how can I fix it?
Thanks in advance,
Behi
0
Answers
-
Hi,
sorry, but I do not have a direct solution. But I would suggest that you setup the process in the GUI first and use the possibility for breakpoints etc. in order to trace down the problem. If everything works fine in the GUI, you can then simply use
or
Process process = new Process(xmlFile);
and
Process process = new Process(xmlString);
in order to deploy the process. It is usually much easier to get things right with the GUI mode before you include the complete process into your own application.
process.run();
Cheers,
Ingo0