"python's subprocess.run() not working inside Rapidminer"

User: "lplenka"
New Altair Community Member
Updated by Jocelyn

Hello friends, I am in a bit of trouble with Python's subprocess.run() inside the Execute Python operator. I am using Xpd Reader's pdftotext to extract text from a pdf file. It seems that the subprocess  fails when I run the process, as I always get a blank text file. 

 

System Details:-

Windows 10

RapidMiner Studio 8.0

Python 3.6

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="68" name="Execute Python" width="90" x="380" y="187">
<parameter key="script" value="import pandas&#10;import sys&#10;import subprocess&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#10; def pdf_text(source, output, timeout=None):&#10; &#10; if sys.platform == &quot;win32&quot;:&#10; args = ['pdftotext', '-simple', source, output]&#10; elif sys.platform == &quot;linux&quot; or sys.platform == &quot;linux2&quot;:&#10; args = ['pdftotext', '-layout', source, output]&#10; &#10; with open(output,&quot;w+&quot;):&#10; process = subprocess.run(&#10; args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout, shell = True)&#10; &#10; &#10; &#10; &#10; input_file = &quot;D:/pdf-sample.pdf&quot;&#10; output_file = &quot;D:/ouput.txt&quot;&#10; pdf_text(input_file, output_file)&#10; &#10; return "/>
</operator>
</process>

I am unable to find any reason for the wrong output. Please help!

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "lplenka"
    New Altair Community Member
    OP
    Accepted Answer

    Hey @lionelderkrikor,

     

    Thanks for trying to help.

    Sorry the previous xml  file was  having some error. This is the new xml file. 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="68" name="Execute Python" width="90" x="380" y="187">
    <parameter key="script" value="import pandas&#10;import sys&#10;import subprocess&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#10; def pdf_text(source, output, timeout=None):&#10; &#10; if sys.platform == &quot;win32&quot;:&#10; args = ['pdftotext', '-simple', source, output]&#10; elif sys.platform == &quot;linux&quot; or sys.platform == &quot;linux2&quot;:&#10; args = ['pdftotext', '-layout', source, output]&#10; &#10; with open(output,&quot;w+&quot;):&#10; process = subprocess.run(&#10; args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout, shell = True)&#10; &#10; &#10; &#10; &#10; input_file = &quot;D:/pdf-sample.pdf&quot;&#10; output_file = &quot;D:/ouput.txt&quot;&#10; pdf_text(input_file, output_file)&#10; &#10; return "/>
    </operator>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    </process>
    </operator>
    </process>

    Well, yes the python script works fine when I run in a notebook or calling the python script from cmd. 

    I am not taking any arguments in rm_main() because this script doesn't need any and I want the text to be extracted to "output.txt" in my D: drive. So no return statements also.

     

     

    Note:

    Surprisingly, I am getting the extracted text in "output.txt" text file now. I don't know why I was not getting output last night.  Did the restart do the trick? Please cross-check in your system.  Thank You :)