POS-Tagger (STTS)

New Altair Community Member

Apr 11, 2012

Updated Nov 5, 2024 by Jocelyn

Hello at all,

I am a new RapidMiner-User and want text analysis chat-dialougs in german.

For these problem I want use a POS-Tagger with the Stuttgart-Tübingen-Tagse (STTS).

Can somebody explain me how I can use this in RapidMiner.

Thank You

D_Zed

Find more posts tagged with

AI Studio

Sort by:

1 - 2 of 21

Rene

New Altair Community Member

Apr 12, 2012

Hi and welcome to RM,

here's a short example which tries to extract nouns and proper names from a given document:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
    <process expanded="true" height="224" width="279">
      <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
        <parameter key="text" value="ach dem Foulspiel von Roman Weidenfeller im Strafraum der Dortmunder schnappte sich Arjen Robben sofort den Ball.&#10;&#10;Ohne zu überlegen marschierte er schnellen Schrittes auf den Elfmeterpunkt zu und legte sich den Ball zurecht.&#10;&#10;Was dann folgte, ist bekannt. (DIASHOW: Der 30. Spieltag).&#10;&#10;Robben wurde zur tragischen Figur des Spitzenspiels zwischen Dortmund und Bayern, das die Borussen mit 1:0 für sich entscheiden konnten (Bericht).&#10;&#10;Sein verschossener Elfmeter war aber nur der Höhepunkt der 14 albtraumhaften Minuten des Niederländers, der in der Kabine &quot;total niedergeschlagen&quot; war, wie Bayern-Manager Christian Nerlinger bestätigte. "/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="5.2.001" expanded="true" height="94" name="Process Documents" width="90" x="45" y="120">
        <process expanded="true" height="355" width="334">
          <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
          <operator activated="true" class="text:filter_tokens_by_pos" compatibility="5.2.001" expanded="true" height="60" name="Filter Tokens (by POS Tags)" width="90" x="188" y="30">
            <parameter key="language" value="German"/>
            <parameter key="expression" value="NN.*|NE.*"/>
          </operator>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by POS Tags)" to_port="document"/>
          <connect from_op="Filter Tokens (by POS Tags)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

To read in your chat logs you would probably use the "Process Documents from Files" operator (instead of "Create Doc + Process Docs") and nest the tokenizer + tagger in there.

Greets
from Berlin,
René

RWingerter

New Altair Community Member

Mar 16, 2013

Thanks for the example!

Roland

🎉Community Raffle - Win $25

POS-Tagger (STTS)

Find more posts tagged with

Quick Links