Hello friends, I am in a bit of trouble with Python's subprocess.run() inside the Execute Python operator. I am using Xpd Reader's pdftotext to extract text from a pdf file. It seems that the subprocess fails when I run the process, as I always get a blank text file.
System Details:-
Windows 10
RapidMiner Studio 8.0
Python 3.6
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="68" name="Execute Python" width="90" x="380" y="187">
<parameter key="script" value="import pandas import sys import subprocess # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def rm_main(): def pdf_text(source, output, timeout=None): if sys.platform == "win32": args = ['pdftotext', '-simple', source, output] elif sys.platform == "linux" or sys.platform == "linux2": args = ['pdftotext', '-layout', source, output] with open(output,"w+"): process = subprocess.run( args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout, shell = True) input_file = "D:/pdf-sample.pdf" output_file = "D:/ouput.txt" pdf_text(input_file, output_file) return "/>
</operator>
</process>
I am unable to find any reason for the wrong output. Please help!