🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Help with Word List Operator"

User: "ronmac"
New Altair Community Member
Updated by Jocelyn
I am trying to add the WordList Operator to this Word Vector code I am working on. I cannot enable it properly. I would appreciate any suggestions on implementing the WordList Operator. I wanted to add it to the end so I can get a list of each word with a count.

Thanks,
Ron McEwan
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
    <process expanded="true" height="431" width="413">
      <operator activated="true" class="web:get_webpage" compatibility="5.0.4" expanded="true" height="60" name="Get Page" width="90" x="55" y="46">
        <parameter key="url" value="http://seekingalpha.com/news/market_currents?source=refreshed"/>
        <list key="query_parameters"/>
      </operator>
      <operator activated="true" class="text:tokenize" compatibility="5.0.7" expanded="true" height="60" name="Tokenize" width="90" x="202" y="41"/>
      <operator activated="true" class="text:extract_length" compatibility="5.0.7" expanded="true" height="60" name="Extract Length" width="90" x="112" y="165"/>
      <operator activated="true" class="text:extract_token_number" compatibility="5.0.7" expanded="true" height="60" name="Extract Token Number" width="90" x="246" y="165"/>
      <connect from_op="Get Page" from_port="output" to_op="Tokenize" to_port="document"/>
      <connect from_op="Tokenize" from_port="document" to_op="Extract Length" to_port="document"/>
      <connect from_op="Extract Length" from_port="document" to_op="Extract Token Number" to_port="document"/>
      <connect from_op="Extract Token Number" from_port="document" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Find more posts tagged with

Sort by:
1 - 4 of 41
    User: "el_chief"
    New Altair Community Member
    check out my blog this week. i've got 5 videos all about text mining, and this should answer your question.
    User: "ronmac"
    New Altair Community Member
    OP
    Thanks, looking forward to it.
    User: "colo"
    New Altair Community Member
    Hi Ron,

    it seems you didn't create a word vector so far. You can use the "Process Documents" operator to simply do this. If you only need the term occurences it's a very simple extension of your example code:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
        <process expanded="true" height="431" width="681">
          <operator activated="true" class="web:get_webpage" compatibility="5.0.4" expanded="true" height="60" name="Get Page" width="90" x="45" y="30">
            <parameter key="url" value="http://seekingalpha.com/news/market_currents?source=refreshed"/>
            <list key="query_parameters"/>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="5.0.6" expanded="true" height="94" name="Process Documents" width="90" x="313" y="30">
            <parameter key="vector_creation" value="Term Occurrences"/>
            <process expanded="true" height="607" width="786">
              <operator activated="true" class="text:tokenize" compatibility="5.0.6" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
              <operator activated="true" class="text:extract_length" compatibility="5.0.6" expanded="true" height="60" name="Extract Length" width="90" x="246" y="30"/>
              <operator activated="true" class="text:extract_token_number" compatibility="5.0.6" expanded="true" height="60" name="Extract Token Number" width="90" x="380" y="30"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Extract Length" to_port="document"/>
              <connect from_op="Extract Length" from_port="document" to_op="Extract Token Number" to_port="document"/>
              <connect from_op="Extract Token Number" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Get Page" from_port="output" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Matthias
    User: "ronmac"
    New Altair Community Member
    OP
    Thanks. The exampple was very helpful.