Hello,
i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.
Here is the code, i used the text and web plugins
public Miner(List<Vulnerability> datasourcelist) {
RapidMiner.init();
Process process = new Process();
process.getRootOperator().setParameter(ProcessRootOperator.PARAMETER_LOGFILE, "log");
Operator op;
ExecutionUnit u;
int counter=0;
try {
for (Vulnerability vuln:datasourcelist){
for (String ref:vuln.getRefs()){
process.getRootOperator().addSubprocess(counter);
u = process.getRootOperator().getSubprocess(counter);
op = OperatorService.createOperator("get_webpage");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("random_user_agent", "true");
op.setParameter("url", ref);
u.addOperator(op);
op = OperatorService.createOperator("extract_html_text_content");
op.setEnabled(true);
op.setExpanded(true);
u.addOperator(op);
op = OperatorService.createOperator("tokenize");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("mode", "specify characters");
op.setParameter("characters", ".:");
u.addOperator(op);
op = OperatorService.createOperator("filter_tokens_by_content");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("condition","matches");
op.setParameter("string", "[a-z]");
op.setParameter("regular_expression", "[a-zA-Z]");
u.addOperator(op);
op = OperatorService.createOperator("write_csv");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("csv_file", "test_csv.csv");
u.addOperator(op);
counter++;
}
}
System.out.println(process.getRootOperator().createProcessTree(0));
process.run();
} catch (OperatorCreationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (OperatorException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
And this is the error i get
com.rapidminer.operator.UserError: No data was deliverd at port extract_html_text_content.document.
at com.rapidminer.operator.ports.impl.AbstractPort.getData(AbstractPort.java:78)
I haven't given any input to the process since all the data should come from get webpage operators.
Thanks
Andrea