"Help with Word List Operator"

ronmac
ronmac New Altair Community Member
edited November 2024 in Community Q&A
I am trying to add the WordList Operator to this Word Vector code I am working on. I cannot enable it properly. I would appreciate any suggestions on implementing the WordList Operator. I wanted to add it to the end so I can get a list of each word with a count.

Thanks,
Ron McEwan
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
    <process expanded="true" height="431" width="413">
      <operator activated="true" class="web:get_webpage" compatibility="5.0.4" expanded="true" height="60" name="Get Page" width="90" x="55" y="46">
        <parameter key="url" value="http://seekingalpha.com/news/market_currents?source=refreshed"/>
        <list key="query_parameters"/>
      </operator>
      <operator activated="true" class="text:tokenize" compatibility="5.0.7" expanded="true" height="60" name="Tokenize" width="90" x="202" y="41"/>
      <operator activated="true" class="text:extract_length" compatibility="5.0.7" expanded="true" height="60" name="Extract Length" width="90" x="112" y="165"/>
      <operator activated="true" class="text:extract_token_number" compatibility="5.0.7" expanded="true" height="60" name="Extract Token Number" width="90" x="246" y="165"/>
      <connect from_op="Get Page" from_port="output" to_op="Tokenize" to_port="document"/>
      <connect from_op="Tokenize" from_port="document" to_op="Extract Length" to_port="document"/>
      <connect from_op="Extract Length" from_port="document" to_op="Extract Token Number" to_port="document"/>
      <connect from_op="Extract Token Number" from_port="document" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • el_chief
    el_chief New Altair Community Member
    check out my blog this week. i've got 5 videos all about text mining, and this should answer your question.
  • ronmac
    ronmac New Altair Community Member
    Thanks, looking forward to it.
  • colo
    colo New Altair Community Member
    Hi Ron,

    it seems you didn't create a word vector so far. You can use the "Process Documents" operator to simply do this. If you only need the term occurences it's a very simple extension of your example code:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="431" width="681">
          <operator activated="true" class="web:get_webpage" compatibility="5.0.4" expanded="true" height="60" name="Get Page" width="90" x="45" y="30">
            <parameter key="url" value="http://seekingalpha.com/news/market_currents?source=refreshed"/>
            <list key="query_parameters"/>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="5.0.6" expanded="true" height="94" name="Process Documents" width="90" x="313" y="30">
            <parameter key="vector_creation" value="Term Occurrences"/>
            <process expanded="true" height="607" width="786">
              <operator activated="true" class="text:tokenize" compatibility="5.0.6" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
              <operator activated="true" class="text:extract_length" compatibility="5.0.6" expanded="true" height="60" name="Extract Length" width="90" x="246" y="30"/>
              <operator activated="true" class="text:extract_token_number" compatibility="5.0.6" expanded="true" height="60" name="Extract Token Number" width="90" x="380" y="30"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Extract Length" to_port="document"/>
              <connect from_op="Extract Length" from_port="document" to_op="Extract Token Number" to_port="document"/>
              <connect from_op="Extract Token Number" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Get Page" from_port="output" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Matthias
  • ronmac
    ronmac New Altair Community Member
    Thanks. The exampple was very helpful.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.