Operator 'Get Pages' not running on AI Hub
methusi
New Altair Community Member
Hi
I have a process running on an AI Hub where I have the operator 'Get Pages' (ext. Web Mining) embedded.
When I run the process in RM Studio everything is fine.
When I run the process on AI Hub but started it from the RM Studio ('Run Process on AI Hub'), everything is fine.
But when I kick off the web-service I created, the operator 'Get-Pages' seems to make trouble. Other web-services are running. And when I disable 'Get Pages' the web-service is running as well. So I strongly believe it has something to do with how the process runs on AI Hub.
This is the error message which I get on running the web-service:
The funny thing is that I found out is that if I run the process out of the repository on AI Hub, it runs successfully. But if I test the web-service, it does not work.
This is the process I used for testing. When I disable the operator 'Get Pages' everything works fine.
I don't know how to proceed.
Thanks for all the help!
Best
Mathis
I have a process running on an AI Hub where I have the operator 'Get Pages' (ext. Web Mining) embedded.
When I run the process in RM Studio everything is fine.
When I run the process on AI Hub but started it from the RM Studio ('Run Process on AI Hub'), everything is fine.
But when I kick off the web-service I created, the operator 'Get-Pages' seems to make trouble. Other web-services are running. And when I disable 'Get Pages' the web-service is running as well. So I strongly believe it has something to do with how the process runs on AI Hub.
This is the error message which I get on running the web-service:
de.rapidanalytics.ejb.service.ServiceDataSourceException Error executing process /home/bot/test_pages for service test_pages: com.rapidminer.operator.web.io.MultiThreadedCookieManager cannot be cast to com.rapidminer.operator.web.io.MultiThreadedCookieManager<br>
The funny thing is that I found out is that if I run the process out of the repository on AI Hub, it runs successfully. But if I test the web-service, it does not work.
This is the process I used for testing. When I disable the operator 'Get Pages' everything works fine.
<?xml version="1.0" encoding="UTF-8"?><process version="9.10.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.10.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.10.001" expanded="true" height="68" name="Retrieve step_3_urls_after_python_short" width="90" x="112" y="136"> <parameter key="repository_entry" value="/home/user/some_table_with_urls"/> </operator> <operator activated="true" class="web:retrieve_webpages" compatibility="9.7.000" expanded="true" height="68" name="Get Pages" width="90" x="447" y="136"> <parameter key="link_attribute" value="links"/> <parameter key="random_user_agent" value="true"/> <parameter key="connection_timeout" value="10000"/> <parameter key="read_timeout" value="10000"/> <parameter key="follow_redirects" value="true"/> <parameter key="accept_cookies" value="original server"/> <parameter key="cookie_scope" value="global"/> <parameter key="request_method" value="GET"/> <parameter key="delay" value="none"/> <parameter key="delay_amount" value="1000"/> <parameter key="min_delay_amount" value="0"/> <parameter key="max_delay_amount" value="500"/> </operator> <connect from_op="Retrieve step_3_urls_after_python_short" from_port="output" to_op="Get Pages" to_port="Example Set"/> <connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
I don't know how to proceed.
Thanks for all the help!
Best
Mathis
Tagged:
0
Best Answer
-
For the ones wondering - I could fix my problem by taking another route. Instead of calling a web service I schedule the process with the schedule API:
POST to server/executions/schedule with the corresponding headers and body
In the body, I do not set an execution time and force=true - this immediately starts the execution.0
Answers
-
For the ones wondering - I could fix my problem by taking another route. Instead of calling a web service I schedule the process with the schedule API:
POST to server/executions/schedule with the corresponding headers and body
In the body, I do not set an execution time and force=true - this immediately starts the execution.0 -
I suspect the issue might be that you have the extension that contains "Get Pages" installed on the AI-Hub JobAgent, but not on the Server itself.If I recall the architecture diagram correctly, when you schedule a job or run it on the Server from Studio then it will execute on a JobAgent.
However, if it is run as a webservice then it doesn't run on a JobAgent, but on the Server itself.Check[docker volumes path]/prod_rm-server-home-vol/_data/resources/extensions and see if you can spot it in there. You can compare it against[docker volumes path]/prod_rm-server-ja-extensions and see if they match.
0