Integration of Process Documents operator

Karahedra
Karahedra New Altair Community Member
edited November 5 in Community Q&A
Hello
i'm trying to use a process documents operator in a java application, the previous operators deliver data to the input ports correctly but i can't obtain any processed data from it.

Here is the code i used to initialize and operate it

processer = OperatorService.createOperator("process_documents");
processer.setEnabled(true);
processer.setParameter("vector_creation", "Term Occurrences");
processer.setParameter("create_word_vector", "true");
        processer.setParameter("add_meta_information", "true");
processer.setParameter("keep_text", "false");

// this section is repeated several times in the actual code     
        filter.getOutputPorts().getPortByIndex(0).connectTo(processer.getInputPorts().getPortByIndex(incounter));
filter.doWork();
        processer.getInputPorts().getPortByIndex(incounter).receive(filter.getOutputPorts().getPortByIndex(0).getAnyDataOrNull());
//end of loop

        processer.getOutputPorts().getPortByIndex(1).connectTo(transformer.getInputPorts().getPortByIndex(0));
        processer.doWork();
       
I'll apreciate any intervention able to reduce my enormous newbieness, thanks to everyone.
Andrea
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Greets Andrea,

    The answer depends on what you are trying to do, if you want to make an operator that you can integrate with the RM IDE then you need to either take a look at the source of an existing extension, or buy the white paper. If that is what you want to do then you will see that you need to explicitly deliver output to the ports, like this...
    @Override
    public void doWork() throws OperatorException {
    H_data input = hDataInput.getData();
    decode(input);
    hDataOutput.deliver(hDataInput.getData());
    }
    If on the other hand you want to embed RM in another application then you can barbarise the code as you see fit.

    Good luck!
  • Karahedra
    Karahedra New Altair Community Member
    Hello
    i'm trying to embed RM in another application, but the operator i'm using (process documents, from the text processing extension) doesn't seem to be working correctly, instead of delivering to the output port a word vector obtained from the documents i feed it, i obtain only an empty vector.
    I think that i'm missing some initialization step so it actually doesn't process anything, but i can't figure out which one...
    Barbaric code is something i produce with a decent bit of enthusiasm, but when my abuses stop giving acceptable results i tend to go back to the experts begging for some advice :)
  • haddock
    haddock New Altair Community Member
    Hi again,

    As I remember it the 'process documents' operator needs inner operators to do the dirty work, like tokenizing and stemming; if you make a process in RM where there are no inner operators, you guessed it... zippo comes back. For example, if I run the following I get some data back ( just by connecting the inner input to the inner output directly, so just passing through ).
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="399" width="886">
          <operator activated="true" class="read_database" compatibility="5.0.8" expanded="true" height="60" name="Read Database" width="90" x="63" y="24">
            <parameter key="connection" value="DellBoy"/>
            <parameter key="query" value="SELECT &quot;Content&quot;, &quot;Link&quot;&#13;&#10;FROM &quot;RSS&quot; where &quot;Content&quot; is not NULL"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.8" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
            <parameter key="name" value="Link"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" compatibility="5.0.5" expanded="true" height="76" name="Process Documents from Data" width="90" x="447" y="30">
            <parameter key="vector_creation" value="Term Occurrences"/>
            <list key="specify_weights"/>
            <process expanded="true" height="399" width="886">
              <connect from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Database" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    whereas when that inner operator is not connected for pass through nothing comes back, i.e when it is like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="399" width="886">
          <operator activated="true" class="read_database" compatibility="5.0.8" expanded="true" height="60" name="Read Database" width="90" x="63" y="24">
            <parameter key="connection" value="DellBoy"/>
            <parameter key="query" value="SELECT &quot;Content&quot;, &quot;Link&quot;&#13;&#10;FROM &quot;RSS&quot; where &quot;Content&quot; is not NULL"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.8" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
            <parameter key="name" value="Link"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="text:process_document_from_data" compatibility="5.0.5" expanded="true" height="76" name="Process Documents from Data" width="90" x="447" y="30">
            <parameter key="vector_creation" value="Term Occurrences"/>
            <list key="specify_weights"/>
            <process expanded="true" height="399" width="886">
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Database" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
          <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Which I guess means you might be better off calling the inner operator directly?

  • Karahedra
    Karahedra New Altair Community Member
    Yes, i think you're right and that should have been the step i was missing, but i haven't found a way to access the inner operators or perform some kind of wiring inside the process documents through java code...
    Again, thanks for the assistance and for the quick answers
  • land
    land New Altair Community Member
    Hi,
    to add operators to so called super operators, use the following code fragment.
    		OperatorChain superOperator;
    superOperator.getSubprocess(0).addOperator(operator);
    Anyway I would suggest taking a look at the API documentation, from where this could have been comprehended.


    Greetings,
      Sebastian