"Problems connecting operators in R5 (Java Application)"
Karahedra
New Altair Community Member
Hello,
i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.
Here is the code, i used the text and web plugins
Thanks
Andrea
i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.
Here is the code, i used the text and web plugins
And this is the error i get
public Miner(List<Vulnerability> datasourcelist) {
RapidMiner.init();
Process process = new Process();
process.getRootOperator().setParameter(ProcessRootOperator.PARAMETER_LOGFILE, "log");
Operator op;
ExecutionUnit u;
int counter=0;
try {
for (Vulnerability vuln:datasourcelist){
for (String ref:vuln.getRefs()){
process.getRootOperator().addSubprocess(counter);
u = process.getRootOperator().getSubprocess(counter);
op = OperatorService.createOperator("get_webpage");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("random_user_agent", "true");
op.setParameter("url", ref);
u.addOperator(op);
op = OperatorService.createOperator("extract_html_text_content");
op.setEnabled(true);
op.setExpanded(true);
u.addOperator(op);
op = OperatorService.createOperator("tokenize");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("mode", "specify characters");
op.setParameter("characters", ".:");
u.addOperator(op);
op = OperatorService.createOperator("filter_tokens_by_content");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("condition","matches");
op.setParameter("string", "[a-z]");
op.setParameter("regular_expression", "[a-zA-Z]");
u.addOperator(op);
op = OperatorService.createOperator("write_csv");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("csv_file", "test_csv.csv");
u.addOperator(op);
counter++;
}
}
System.out.println(process.getRootOperator().createProcessTree(0));
process.run();
} catch (OperatorCreationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (OperatorException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I haven't given any input to the process since all the data should come from get webpage operators.
com.rapidminer.operator.UserError: No data was deliverd at port extract_html_text_content.document.
at com.rapidminer.operator.ports.impl.AbstractPort.getData(AbstractPort.java:78)
Thanks
Andrea
0
Answers
-
Hi there,
4.6 was much loved and has now retired, which is a mixed blessing for you, as the Web Mining and Text crunching plugins have also been updated and are now called Extensions. There are non-trivial architectural differences which you should look into. Time to upgrade I fear!
0 -
Probably i haven't been really clear in my explanation, i already use R5 (or at least i try to :P). I've tried to go with the 4.6 tutorials just because I'm unable to read German, but that left me unable to understand how some things should be done and quite doubtful about the correctness of the ones i managed to put together0
-
Cool, I'd take one of the RM 5.00 plugins apart to see how it can be done, and invest in Sebastian's paper on the subject of extensions; but there are many ways to ...
0 -
Hi,
if you want to use RapidMiner API, you should be aware, that there has been many changes between 4.x and 5.0! We dropped the implicit data pass through and replaced it by the explicit flow layout, and this has some impact on the api, as well. Operators now need to be delivered with the single data objects by getting the port and setting the data there.
After the great success of the Extension White Paper (It even outperforms the Free Webinar regarding the profit) I'm going to write an Integration White paper. But I wouldn't wait for it...If you take a look here in the forum how long it took me for writing the first one...
Greetings,
Sebastian0 -
Hello,
this is what i needed, thanks.
Apparently i become more shortsighted than usual since i didn't notice connection and receive methods of ports, now my test code seems to be working just fine (i missed your paper as well)
Thanks again for your help and for the ready answers0 -
Hi Sebastian,
I bought the white paper and found it very useful, but my interest is more in integration. I just want to bring my support to this integration White paper. I will definitely buy it.
Thanks for the great job!
0