keyword-based text mining

seba77
seba77 New Altair Community Member
edited November 5 in Community Q&A

Hello there,

I have a list of 50 keywords and want to analyze their occurence frequency in my dataset.
The general text mining process is not the problem. But I only want to analyze these 50 keywords. 
How can I apply this?

 

Thank you very much in advance!

 

Best Answer

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    You just create a wordlist with those 50 words and then apply that specific wordlist (using the wordlist input port) for any subsequent document you are going to process.

     

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    Hi @seba77,

     

    if I good understand, you can use use the Create ExampleSet operator to write yout list of 50 keywords and the Process Documents

    from Data and Process Documents operators to filter all the others words from your document.

    Here an example of process to adapt to your keywords and document : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="Create Exampleset" width="90" x="45" y="34">
    <parameter key="Input Csv" value="att1&#10;apples&#10;oranges&#10;bananas"/>
    <parameter key="Parse all as Nominal" value="true"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="179" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="447" y="34"/>
    <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
    <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="447" y="187">
    <parameter key="text" value="apples are sweeter than oranges but bananas are the sweetest of them all"/>
    </operator>
    <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="581" y="85">
    <process expanded="true">
    <connect from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Create Exampleset" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents" to_port="word list"/>
    <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
    <connect from_op="Process Documents" from_port="example set" to_port="result 2"/>
    <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this example of process answer to your need ?

     

    Regards,

     

    Lionel

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    You just create a wordlist with those 50 words and then apply that specific wordlist (using the wordlist input port) for any subsequent document you are going to process.