how to implement python code for the text mining process ?

ksnugroho
ksnugroho New Altair Community Member
edited November 5 in Community Q&A
Hello, 

Answers

  • sgenzer
    sgenzer
    Altair Employee
    hi @ksnugroho - you can use the Execute Python operator (in the Python extension) anywhere you want.

    Scott
  • kayman
    kayman New Altair Community Member
    Some background on using the python operator : 

    - You can use it as a standalone 'script container' wherever you want, so there isn't even a need to use input or output data.
    - If you want to use data (either incoming or outgoing) remember that the operator is treating your data by default as a panda's dataframe. So simply entering data to the inputs allows you to work with the data as a dataframe, and in case you want to manipulate data in other def's, or  load external data you just need to return it in the rm_man block as dataframe again.

    Find below a simple example, where I use 2 inputs and xlsxwriter, and the python script will generate a multi tabbed excel file, adding the inputs each on one tab, and that's it.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="python_scripting:execute_python" compatibility="8.2.000" expanded="true" height="124" name="Execute Python (2)" width="90" x="246" y="85">
            <parameter key="script" value="import pandas as pd&#10;import xlsxwriter&#10;&#10;def rm_main(data1, data2):&#10;&#10;    writer = pd.ExcelWriter('my_file.xlsx', engine='xlsxwriter')&#10;&#10;    # Write your DataFrame to a file   &#10;    data1.to_excel(writer, 'Page 1')  &#10;    data2.to_excel(writer, 'Page 2')&#10;&#10;    # Save the result &#10;    writer.save()&#10;&#10;    return"/>
          </operator>
          <connect from_port="input 1" to_op="Execute Python (2)" to_port="input 1"/>
          <connect from_port="input 2" to_op="Execute Python (2)" to_port="input 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="source_input 3" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>