Hi,
When I load PDF files in my process I get the following exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at com.rapidminer.operator.TermWeightClusterCharacterizer.apply(Unknown Source)
at com.rapidminer.operator.Operator.apply(Operator.java:664)
at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
at com.rapidminer.operator.Operator.apply(Operator.java:664)
at com.rapidminer.Process.run(Process.java:612)
at com.rapidminer.Process.run(Process.java:582)
at com.rapidminer.Process.run(Process.java:572)
at org.behrang.clustering.Main.createProcess(Main.java:77)
at org.behrang.clustering.Main.main(Main.java:26)
Here's my process:
System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
RapidMiner.init();
Process p = new Process();
OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
textInput.setParameter(PARAMETER_DEFAULT_CONTENT_LANGUAGE, "english");
textInput.setParameter(PARAMETER_PRUNE_ABOVE, "15");
textInput.setParameter(PARAMETER_PRUNE_BELOW, "5");
// textInput.setParameter(PARAMETER_DEFAULT_CONTENT_TYPE, "pdf");
List<Object[]> textList = new LinkedList<Object[]>();
for (File f : new File("fit4005").listFiles()) {
textList.add(new Object[] {
f.getAbsolutePath(),
f.getAbsolutePath()
});
}
// for (File f : new File("newsgroup/graphics").listFiles()) {
// textList.add(new Object[] {
// f.getAbsolutePath(),
// f.getAbsolutePath()
// });
// }
// for (File f : new File("newsgroup/hardware").listFiles()) {
// textList.add(new Object[] {
// f.getAbsolutePath(),
// f.getAbsolutePath()
// });
// }
// textList.add(new Object[] {"graphics","newsgroup/graphics"});
// textList.add(new Object[] {"hardware","newsgroup/hardware"});
textInput.setListParameter("texts", textList);
textInput.addOperator(OperatorService.createOperator("StringTokenizer"));
textInput.addOperator(OperatorService.createOperator("EnglishStopwordFilter"));
Operator tlfOperator = OperatorService.createOperator("TokenLengthFilter");
tlfOperator.setParameter("min_chars", "5");
textInput.addOperator(tlfOperator);
textInput.addOperator(OperatorService.createOperator("PorterStemmer"));
p.getRootOperator().addOperator(textInput);
p.getRootOperator().addOperator(OperatorService.createOperator("KMeans"));
p.getRootOperator().addOperator(OperatorService.createOperator("AttributeSumClusterCharacterizer"));
p.save(new File("Process.xml"));
IOContainer io = p.run();
SimpleExampleSet ses = (SimpleExampleSet) io.get(SimpleExampleSet.class);
System.out.println(ses.getExample(0));
System.exit(0);
fit4005
contains the PDF files. If I load text files everything works fine. Any ideas why is this happening and how can I fix it?
Thanks in advance,
Behi