🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"python's subprocess.run() not working inside Rapidminer"

User: "lplenka"
New Altair Community Member
Updated by Jocelyn

Hello friends, I am in a bit of trouble with Python's subprocess.run() inside the Execute Python operator. I am using Xpd Reader's pdftotext to extract text from a pdf file. It seems that the subprocess  fails when I run the process, as I always get a blank text file. 

 

System Details:-

Windows 10

RapidMiner Studio 8.0

Python 3.6

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="68" name="Execute Python" width="90" x="380" y="187">
<parameter key="script" value="import pandas&#10;import sys&#10;import subprocess&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#10; def pdf_text(source, output, timeout=None):&#10; &#10; if sys.platform == &quot;win32&quot;:&#10; args = ['pdftotext', '-simple', source, output]&#10; elif sys.platform == &quot;linux&quot; or sys.platform == &quot;linux2&quot;:&#10; args = ['pdftotext', '-layout', source, output]&#10; &#10; with open(output,&quot;w+&quot;):&#10; process = subprocess.run(&#10; args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout, shell = True)&#10; &#10; &#10; &#10; &#10; input_file = &quot;D:/pdf-sample.pdf&quot;&#10; output_file = &quot;D:/ouput.txt&quot;&#10; pdf_text(input_file, output_file)&#10; &#10; return "/>
</operator>
</process>

I am unable to find any reason for the wrong output. Please help!

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "lplenka"
    New Altair Community Member
    OP
    Accepted Answer

    Hey @lionelderkrikor,

     

    Thanks for trying to help.

    Sorry the previous xml  file was  having some error. This is the new xml file. 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="68" name="Execute Python" width="90" x="380" y="187">
    <parameter key="script" value="import pandas&#10;import sys&#10;import subprocess&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#10; def pdf_text(source, output, timeout=None):&#10; &#10; if sys.platform == &quot;win32&quot;:&#10; args = ['pdftotext', '-simple', source, output]&#10; elif sys.platform == &quot;linux&quot; or sys.platform == &quot;linux2&quot;:&#10; args = ['pdftotext', '-layout', source, output]&#10; &#10; with open(output,&quot;w+&quot;):&#10; process = subprocess.run(&#10; args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout, shell = True)&#10; &#10; &#10; &#10; &#10; input_file = &quot;D:/pdf-sample.pdf&quot;&#10; output_file = &quot;D:/ouput.txt&quot;&#10; pdf_text(input_file, output_file)&#10; &#10; return "/>
    </operator>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    </process>
    </operator>
    </process>

    Well, yes the python script works fine when I run in a notebook or calling the python script from cmd. 

    I am not taking any arguments in rm_main() because this script doesn't need any and I want the text to be extracted to "output.txt" in my D: drive. So no return statements also.

     

     

    Note:

    Surprisingly, I am getting the extracted text in "output.txt" text file now. I don't know why I was not getting output last night.  Did the restart do the trick? Please cross-check in your system.  Thank You :)