"Problems connecting operators in R5 (Java Application)"

Karahedra
Karahedra New Altair Community Member
edited November 5 in Community Q&A
Hello,

i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.

Here is the code, i used the text and web plugins

public Miner(List<Vulnerability> datasourcelist) {
RapidMiner.init();
Process process = new Process();
process.getRootOperator().setParameter(ProcessRootOperator.PARAMETER_LOGFILE, "log");
Operator op;
ExecutionUnit u;
int counter=0;
try {
for (Vulnerability vuln:datasourcelist){
for (String ref:vuln.getRefs()){
process.getRootOperator().addSubprocess(counter);
u = process.getRootOperator().getSubprocess(counter);
op = OperatorService.createOperator("get_webpage");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("random_user_agent", "true");
op.setParameter("url", ref);
u.addOperator(op);
op = OperatorService.createOperator("extract_html_text_content");
op.setEnabled(true);
op.setExpanded(true);
u.addOperator(op);
op = OperatorService.createOperator("tokenize");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("mode", "specify characters");
op.setParameter("characters", ".:");
u.addOperator(op);
op = OperatorService.createOperator("filter_tokens_by_content");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("condition","matches");
op.setParameter("string", "[a-z]");
op.setParameter("regular_expression", "[a-zA-Z]");
u.addOperator(op);
op = OperatorService.createOperator("write_csv");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("csv_file", "test_csv.csv");
u.addOperator(op);
counter++;
}
}
System.out.println(process.getRootOperator().createProcessTree(0));
process.run();
} catch (OperatorCreationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (OperatorException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
And this is the error i get

com.rapidminer.operator.UserError: No data was deliverd at port extract_html_text_content.document.
at com.rapidminer.operator.ports.impl.AbstractPort.getData(AbstractPort.java:78)
I haven't given any input to the process since all the data should come from get webpage operators.


Thanks
Andrea
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Hi there,

    4.6 was much loved and has now retired, which is a mixed blessing for you, as the Web Mining and Text crunching plugins have also been updated and are now called Extensions. There are non-trivial architectural differences which you should look into. Time to upgrade I fear!

  • Karahedra
    Karahedra New Altair Community Member
    Probably i haven't been really clear in my explanation, i already use R5 (or at least i try to :P). I've tried to go with the 4.6 tutorials just because I'm unable to read German, but that left me unable to understand how some things should be done and quite doubtful about the correctness of the ones i managed to put together
  • haddock
    haddock New Altair Community Member
    Cool, I'd take one of the RM 5.00 plugins apart to see how it can be done, and invest in Sebastian's paper on the subject of extensions; but there are many ways to ...

  • land
    land New Altair Community Member
    Hi,
    if you want to use RapidMiner API, you should be aware, that there has been many changes between 4.x and 5.0! We dropped the implicit data pass through and replaced it by the explicit flow layout, and this has some impact on the api, as well. Operators now need to be delivered with the single data objects by getting the port and setting the data there.
    After the great success of the Extension White Paper (It even outperforms the Free Webinar regarding the profit) I'm going to write an Integration White paper. But I wouldn't wait for it...If you take a look here in the forum how long it took me for writing the first one...

    Greetings,
      Sebastian
  • Karahedra
    Karahedra New Altair Community Member
    Hello,
    this is what i needed, thanks.
    Apparently i become more shortsighted than usual since i didn't notice connection and receive methods of ports, now my test code seems to be working just fine (i missed your paper as well)

    Thanks again for your help and for the ready answers
  • pop
    pop New Altair Community Member
    Hi Sebastian,

    I bought the white paper and found it very useful, but my interest is more in integration. I just want to bring my support to this integration White paper. I will definitely buy it.
    Thanks for the great job!