Hi
I have a process running on an AI Hub where I have the operator 'Get Pages' (ext. Web Mining) embedded.
When I run the process in RM Studio everything is fine.
When I run the process on AI Hub but started it from the RM Studio (
'Run Process on AI Hub'), everything is fine.
But when I kick off the web-service I created, the operator 'Get-Pages' seems to make trouble. Other web-services are running. And when I disable 'Get Pages' the web-service is running as well. So I strongly believe it has something to do with how the process runs on AI Hub.
This is the error message which I get on running the web-service:
de.rapidanalytics.ejb.service.ServiceDataSourceException
Error executing process /home/bot/test_pages for service test_pages:
com.rapidminer.operator.web.io.MultiThreadedCookieManager cannot be cast to
com.rapidminer.operator.web.io.MultiThreadedCookieManager<br>
The funny thing is that I found out is that if I run the process out of the repository on AI Hub, it runs successfully. But if I test the web-service, it does not work.
This is the process I used for testing. When I disable the operator 'Get Pages' everything works fine.
<?xml version="1.0" encoding="UTF-8"?><process version="9.10.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.10.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.10.001" expanded="true" height="68" name="Retrieve step_3_urls_after_python_short" width="90" x="112" y="136">
<parameter key="repository_entry" value="/home/user/some_table_with_urls"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="9.7.000" expanded="true" height="68" name="Get Pages" width="90" x="447" y="136">
<parameter key="link_attribute" value="links"/>
<parameter key="random_user_agent" value="true"/>
<parameter key="connection_timeout" value="10000"/>
<parameter key="read_timeout" value="10000"/>
<parameter key="follow_redirects" value="true"/>
<parameter key="accept_cookies" value="original server"/>
<parameter key="cookie_scope" value="global"/>
<parameter key="request_method" value="GET"/>
<parameter key="delay" value="none"/>
<parameter key="delay_amount" value="1000"/>
<parameter key="min_delay_amount" value="0"/>
<parameter key="max_delay_amount" value="500"/>
</operator>
<connect from_op="Retrieve step_3_urls_after_python_short" from_port="output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I don't know how to proceed.
Thanks for all the help!
Best
Mathis