POS-Tagger (STTS)
D_Zed
New Altair Community Member
Hello at all,
I am a new RapidMiner-User and want text analysis chat-dialougs in german.
For these problem I want use a POS-Tagger with the Stuttgart-Tübingen-Tagse (STTS).
Can somebody explain me how I can use this in RapidMiner.
Thank You
D_Zed
I am a new RapidMiner-User and want text analysis chat-dialougs in german.
For these problem I want use a POS-Tagger with the Stuttgart-Tübingen-Tagse (STTS).
Can somebody explain me how I can use this in RapidMiner.
Thank You
D_Zed
Tagged:
0
Answers
-
Hi and welcome to RM,
here's a short example which tries to extract nouns and proper names from a given document:
To read in your chat logs you would probably use the "Process Documents from Files" operator (instead of "Create Doc + Process Docs") and nest the tokenizer + tagger in there.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="224" width="279">
<operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
<parameter key="text" value="ach dem Foulspiel von Roman Weidenfeller im Strafraum der Dortmunder schnappte sich Arjen Robben sofort den Ball. Ohne zu überlegen marschierte er schnellen Schrittes auf den Elfmeterpunkt zu und legte sich den Ball zurecht. Was dann folgte, ist bekannt. (DIASHOW: Der 30. Spieltag). Robben wurde zur tragischen Figur des Spitzenspiels zwischen Dortmund und Bayern, das die Borussen mit 1:0 für sich entscheiden konnten (Bericht). Sein verschossener Elfmeter war aber nur der Höhepunkt der 14 albtraumhaften Minuten des Niederländers, der in der Kabine "total niedergeschlagen" war, wie Bayern-Manager Christian Nerlinger bestätigte. "/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="5.2.001" expanded="true" height="94" name="Process Documents" width="90" x="45" y="120">
<process expanded="true" height="355" width="334">
<operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
<operator activated="true" class="text:filter_tokens_by_pos" compatibility="5.2.001" expanded="true" height="60" name="Filter Tokens (by POS Tags)" width="90" x="188" y="30">
<parameter key="language" value="German"/>
<parameter key="expression" value="NN.*|NE.*"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by POS Tags)" to_port="document"/>
<connect from_op="Filter Tokens (by POS Tags)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Greets
from Berlin,
René0 -
Thanks for the example!
Roland0