I've encountered a very strange and very annoying problem when trying to run some python packages. All of them work on local desktop, or when running the server process in local mode. But whenever I want to run the same process entirly on the server (Ubuntu 16.04) it fails and gives me 'the script can not be parsed'.
On a windows server setup they work fine, so my first guess was security settings, but running the same process on another ubuntu test server where I really give everything all options it still gave problems, so I can probably count that out.
Some packages work fine on the server, basically any standard python command works fine but it seems as soon as there is some internet connection required the script fails. I have 2 totally different ones giving the same problems, one that I use to call the microsoft translation API's and another one I use to validate a language. As mentioned they work fine on the desktop framework, and under windows server, and when using them on the linux servers outside of Rapidminer. So I'm really stuck and it's a key aspect of our to be process.
If added a simplified workflow, with one sentence. First part it uses a beautiful soup pythin script, that works fine. Second part uses langid.py to get the language. This fails, only when executed on the server (ubuntu)
I would stringly appreciate if someone could take a look at this, as this is of extreme importance for us. We are going to make a big investment in RM and translation to allow text mining is a huge part of the process flow. It worked all fine on a smaller windows test server, but the final production server will be Linux.
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.5.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="179" y="187">
<list key="attribute_values">
<parameter key="data" value=""Dit is een zin in het Nederlands""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="187"/>
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="simple py" width="90" x="447" y="187">
<parameter key="script" value="import pandas as pd from bs4 import BeautifulSoup def rm_main(data): 	langs=[] 	for index,row in data.iterrows(): 		# we select the first interaction field to be translated, and strip eventual tags 		s=BeautifulSoup(row["data"],"lxml").get_text(" \[-\] ") 		langs.append(s) 	# and finally we add all the new data to the dataframe 	data['data']=langs 	return data "/>
<description align="center" color="transparent" colored="false" width="126">This works so python is installed correctly on server</description>
</operator>
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="get language" width="90" x="581" y="187">
<parameter key="script" value="import pandas as pd import langid def rm_main(data): 	langs=[] 	for index,row in data.iterrows(): 		# we select the first interaction field to be translated, and strip eventual tags 		s=row["data"] 		try: 			rl = langid.classify(s)[0] 		except: 			pass 			rl = "undefined" 		langs.append(rl) 	# and finally we add all the new data to the dataframe 	data['lang']=langs 	return data "/>
<description align="center" color="transparent" colored="false" width="126">This one fails. Using the same script in other programs, or from cmd line works fine, so the package is installed correctly. Also works fine on local machine</description>
</operator>
<operator activated="true" class="store" compatibility="7.5.001" expanded="true" height="68" name="Store" width="90" x="715" y="187">
<parameter key="repository_entry" value="result"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="simple py" to_port="input 1"/>
<connect from_op="simple py" from_port="output 1" to_op="get language" to_port="input 1"/>
<connect from_op="get language" from_port="output 1" to_op="Store" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>