🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Difficulties using Filter Tokens (by Region) operator"

User: "walther_krefeld"
New Altair Community Member
Updated by Jocelyn
I am using the text processing extension to extract information from patent files. If I use Tokenization and some other filters (like Stoppword - Filter) it works fine.
If I work with the Filter Tokens (by Region) operators I am getting zero results. The condition is: Contains "Klebstoff", no case sensitive. This expression appears many times in the readed documents. Interestingly, the program complains if I select the option contains that the regular expression must be specified. In my thought I need this regular expression only if I select the match condition. I am wrong here?

My idea is the automatic extraction from patentfiles content around a given subject. Any help is willcome, I am working on my master thesis. :)
For the test I have put in the same expression for the condition regular expression and search string. Without defining the regular expression the filter does not work.

<process expanded="true" height="251" width="614">
     <operator activated="true" class="text:process_document_from_file" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
       <list key="text_directories">
         <parameter key="B24" value="D:\Test_Information_Extraktion2\Deutsch"/>
       </list>
       <parameter key="file_pattern" value="*.pdf"/>
       <process expanded="true" height="466" width="882">
         <operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30">
           <parameter key="language" value="German"/>
         </operator>
         <operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="179" y="30">
           <parameter key="transform_to" value="upper case"/>
         </operator>
         <operator activated="true" class="text:filter_tokens_by_regions" compatibility="5.2.004" expanded="true" height="60" name="Filter Tokens (by Region)" width="90" x="332" y="30">
           <parameter key="string" value="KLEBSTOFF"/>
           <parameter key="regular_expression" value="KLEBSTOFF"/>
         </operator>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
         <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Region)" to_port="document"/>
         <connect from_op="Filter Tokens (by Region)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="90"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <connect from_port="input 1" to_op="Process Documents from Files" to_port="word list"/>
     <connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="source_input 2" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Find more posts tagged with