Integration of Process Documents operator
Karahedra
New Altair Community Member
Hello
i'm trying to use a process documents operator in a java application, the previous operators deliver data to the input ports correctly but i can't obtain any processed data from it.
Here is the code i used to initialize and operate it
Andrea
i'm trying to use a process documents operator in a java application, the previous operators deliver data to the input ports correctly but i can't obtain any processed data from it.
Here is the code i used to initialize and operate it
I'll apreciate any intervention able to reduce my enormous newbieness, thanks to everyone.
processer = OperatorService.createOperator("process_documents");
processer.setEnabled(true);
processer.setParameter("vector_creation", "Term Occurrences");
processer.setParameter("create_word_vector", "true");
processer.setParameter("add_meta_information", "true");
processer.setParameter("keep_text", "false");
// this section is repeated several times in the actual code
filter.getOutputPorts().getPortByIndex(0).connectTo(processer.getInputPorts().getPortByIndex(incounter));
filter.doWork();
processer.getInputPorts().getPortByIndex(incounter).receive(filter.getOutputPorts().getPortByIndex(0).getAnyDataOrNull());
//end of loop
processer.getOutputPorts().getPortByIndex(1).connectTo(transformer.getInputPorts().getPortByIndex(0));
processer.doWork();
Andrea
Tagged:
0
Answers
-
Greets Andrea,
The answer depends on what you are trying to do, if you want to make an operator that you can integrate with the RM IDE then you need to either take a look at the source of an existing extension, or buy the white paper. If that is what you want to do then you will see that you need to explicitly deliver output to the ports, like this...@Override
If on the other hand you want to embed RM in another application then you can barbarise the code as you see fit.
public void doWork() throws OperatorException {
H_data input = hDataInput.getData();
decode(input);
hDataOutput.deliver(hDataInput.getData());
}
Good luck!0 -
Hello
i'm trying to embed RM in another application, but the operator i'm using (process documents, from the text processing extension) doesn't seem to be working correctly, instead of delivering to the output port a word vector obtained from the documents i feed it, i obtain only an empty vector.
I think that i'm missing some initialization step so it actually doesn't process anything, but i can't figure out which one...
Barbaric code is something i produce with a decent bit of enthusiasm, but when my abuses stop giving acceptable results i tend to go back to the experts begging for some advice
0 -
Hi again,
As I remember it the 'process documents' operator needs inner operators to do the dirty work, like tokenizing and stemming; if you make a process in RM where there are no inner operators, you guessed it... zippo comes back. For example, if I run the following I get some data back ( just by connecting the inner input to the inner output directly, so just passing through ).<?xml version="1.0" encoding="UTF-8" standalone="no"?>
whereas when that inner operator is not connected for pass through nothing comes back, i.e when it is like this...
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<process expanded="true" height="399" width="886">
<operator activated="true" class="read_database" compatibility="5.0.8" expanded="true" height="60" name="Read Database" width="90" x="63" y="24">
<parameter key="connection" value="DellBoy"/>
<parameter key="query" value="SELECT "Content", "Link" FROM "RSS" where "Content" is not NULL"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.0.8" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
<parameter key="name" value="Link"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.0.5" expanded="true" height="76" name="Process Documents from Data" width="90" x="447" y="30">
<parameter key="vector_creation" value="Term Occurrences"/>
<list key="specify_weights"/>
<process expanded="true" height="399" width="886">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process><?xml version="1.0" encoding="UTF-8" standalone="no"?>
Which I guess means you might be better off calling the inner operator directly?
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<process expanded="true" height="399" width="886">
<operator activated="true" class="read_database" compatibility="5.0.8" expanded="true" height="60" name="Read Database" width="90" x="63" y="24">
<parameter key="connection" value="DellBoy"/>
<parameter key="query" value="SELECT "Content", "Link" FROM "RSS" where "Content" is not NULL"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.0.8" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
<parameter key="name" value="Link"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.0.5" expanded="true" height="76" name="Process Documents from Data" width="90" x="447" y="30">
<parameter key="vector_creation" value="Term Occurrences"/>
<list key="specify_weights"/>
<process expanded="true" height="399" width="886">
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
</process>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Yes, i think you're right and that should have been the step i was missing, but i haven't found a way to access the inner operators or perform some kind of wiring inside the process documents through java code...
Again, thanks for the assistance and for the quick answers0 -
Hi,
to add operators to so called super operators, use the following code fragment.OperatorChain superOperator;
Anyway I would suggest taking a look at the API documentation, from where this could have been comprehended.
superOperator.getSubprocess(0).addOperator(operator);
Greetings,
Sebastian0