problem with stopwordfilterfile

nguyenxuanhau
nguyenxuanhau New Altair Community Member
edited November 5 in Community Q&A
my file xml as:

<process version="4.6">

  <operator name="Root" class="Process" expanded="yes">
      <description text="Text Hau"/>
      <parameter key="logverbosity" value="init"/>
      <parameter key="random_seed" value="2001"/>
      <parameter key="send_mail" value="never"/>
      <parameter key="process_duration_for_mail" value="30"/>
      <parameter key="encoding" value="UTF-8"/>
      <operator name="TextInput" class="TextInput" expanded="yes">
          <list key="texts">
            <parameter key="graphics" value="dulieu"/>
          </list>
          <parameter key="default_content_type" value=""/>
          <parameter key="default_content_encoding" value="utf-8"/>
          <parameter key="default_content_language" value=""/>
          <parameter key="prune_below" value="-1"/>
          <parameter key="prune_above" value="-1"/>
          <parameter key="vector_creation" value="TermOccurrences"/>
          <parameter key="use_content_attributes" value="false"/>
          <parameter key="use_given_word_list" value="false"/>
          <parameter key="return_word_list" value="false"/>
          <parameter key="id_attribute_type" value="short"/>
          <list key="namespaces">
          </list>
          <parameter key="create_text_visualizer" value="false"/>
          <parameter key="on_the_fly_pruning" value="-1"/>
          <parameter key="extend_exampleset" value="false"/>
          <operator name="StringTokenizer" class="StringTokenizer">
          </operator>
          <operator name="StopwordFilterFile" class="StopwordFilterFile">
              <parameter key="file" value="dulieu/stopword.txt"/>
              <parameter key="case_sensitive" value="true"/>
          </operator>
      </operator>
  </operator>

</process>

when i run this file, it don't filter words that were encoded by utf-8
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    if you switch to expert mode of RapidMiner in the parameters view, you will see that there is an encoding parameter. If you set this parameter to UTF-8 the process will work.

    Greetings,
    Sebastian