Error on regexp
sennierer
New Altair Community Member
I have a problem with the information extraction operator. I have a crawler that is loading webpages and then I use the process documents from data operator to process these websites. I use keep document parts and then the information extraction for getting a number, but the rexexp of the information extraction operator is always exiting with "Process Failed. No group 1".
These are the two operators:
<operator activated="true" class="text:keep_document_parts" compatibility="5.1.003" expanded="true" height="60" name="Keep Document Parts" width="90" x="84" y="30">
<parameter key="extraction_regex" value="von\s+\d+\s+<span\sclass="text1">"/>
</operator>
<operator activated="true" class="text:extract_information" compatibility="5.1.003" expanded="true" height="60" name="Extract Information" width="90" x="447" y="75">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Numerical"/>
<list key="regular_expression_queries">
<parameter key="treffer" value="\s\d+\s"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
</operator>
The text that is coming to the second operator is something like: "von 17 <span class="text1"> "
I tested the regexp and normally it should work.
I would be thankful for any help!
These are the two operators:
<operator activated="true" class="text:keep_document_parts" compatibility="5.1.003" expanded="true" height="60" name="Keep Document Parts" width="90" x="84" y="30">
<parameter key="extraction_regex" value="von\s+\d+\s+<span\sclass="text1">"/>
</operator>
<operator activated="true" class="text:extract_information" compatibility="5.1.003" expanded="true" height="60" name="Extract Information" width="90" x="447" y="75">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Numerical"/>
<list key="regular_expression_queries">
<parameter key="treffer" value="\s\d+\s"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
</operator>
The text that is coming to the second operator is something like: "von 17 <span class="text1"> "
I tested the regexp and normally it should work.
I would be thankful for any help!
Tagged:
0