An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
here's a short example which tries to extract nouns and proper names from a given document:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <process version="5.2.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process"> <process expanded="true" height="224" width="279"> <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document" width="90" x="45" y="30"> <parameter key="text" value="ach dem Foulspiel von Roman Weidenfeller im Strafraum der Dortmunder schnappte sich Arjen Robben sofort den Ball. Ohne zu überlegen marschierte er schnellen Schrittes auf den Elfmeterpunkt zu und legte sich den Ball zurecht. Was dann folgte, ist bekannt. (DIASHOW: Der 30. Spieltag). Robben wurde zur tragischen Figur des Spitzenspiels zwischen Dortmund und Bayern, das die Borussen mit 1:0 für sich entscheiden konnten (Bericht). Sein verschossener Elfmeter war aber nur der Höhepunkt der 14 albtraumhaften Minuten des Niederländers, der in der Kabine "total niedergeschlagen" war, wie Bayern-Manager Christian Nerlinger bestätigte. "/> </operator> <operator activated="true" class="text:process_documents" compatibility="5.2.001" expanded="true" height="94" name="Process Documents" width="90" x="45" y="120"> <process expanded="true" height="355" width="334"> <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/> <operator activated="true" class="text:filter_tokens_by_pos" compatibility="5.2.001" expanded="true" height="60" name="Filter Tokens (by POS Tags)" width="90" x="188" y="30"> <parameter key="language" value="German"/> <parameter key="expression" value="NN.*|NE.*"/> </operator> <connect from_port="document" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by POS Tags)" to_port="document"/> <connect from_op="Filter Tokens (by POS Tags)" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/> <connect from_op="Process Documents" from_port="word list" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
To read in your chat logs you would probably use the "Process Documents from Files" operator (instead of "Create Doc + Process Docs") and nest the tokenizer + tagger in there.
here's a short example which tries to extract nouns and proper names from a given document: To read in your chat logs you would probably use the "Process Documents from Files" operator (instead of "Create Doc + Process Docs") and nest the tokenizer + tagger in there.
Greets
from Berlin,
René