Operator 'Get Pages' not running on AI Hub

methusi
methusi New Altair Community Member
edited November 2024 in Community Q&A
Hi

I have a process running on an AI Hub where I have the operator 'Get Pages' (ext. Web Mining) embedded.
When I run the process in RM Studio everything is fine.
When I run the process on AI Hub but started it from the RM Studio ('Run Process on AI Hub'), everything is fine.

But when I kick off the web-service I created, the operator 'Get-Pages' seems to make trouble. Other web-services are running. And when I disable 'Get Pages' the web-service is running as well. So I strongly believe it has something to do with how the process runs on AI Hub.

This is the error message which I get on running the web-service:
de.rapidanalytics.ejb.service.ServiceDataSourceException 
Error executing process /home/bot/test_pages for service test_pages: 
com.rapidminer.operator.web.io.MultiThreadedCookieManager cannot be cast to 
com.rapidminer.operator.web.io.MultiThreadedCookieManager<br>

The funny thing is that I found out is that if I run the process out of the repository on AI Hub, it runs successfully. But if I test the web-service, it does not work.

This is the process I used for testing. When I disable the operator 'Get Pages' everything works fine.
<?xml version="1.0" encoding="UTF-8"?><process version="9.10.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.10.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.10.001" expanded="true" height="68" name="Retrieve step_3_urls_after_python_short" width="90" x="112" y="136">
        <parameter key="repository_entry" value="/home/user/some_table_with_urls"/>
      </operator>
      <operator activated="true" class="web:retrieve_webpages" compatibility="9.7.000" expanded="true" height="68" name="Get Pages" width="90" x="447" y="136">
        <parameter key="link_attribute" value="links"/>
        <parameter key="random_user_agent" value="true"/>
        <parameter key="connection_timeout" value="10000"/>
        <parameter key="read_timeout" value="10000"/>
        <parameter key="follow_redirects" value="true"/>
        <parameter key="accept_cookies" value="original server"/>
        <parameter key="cookie_scope" value="global"/>
        <parameter key="request_method" value="GET"/>
        <parameter key="delay" value="none"/>
        <parameter key="delay_amount" value="1000"/>
        <parameter key="min_delay_amount" value="0"/>
        <parameter key="max_delay_amount" value="500"/>
      </operator>
      <connect from_op="Retrieve step_3_urls_after_python_short" from_port="output" to_op="Get Pages" to_port="Example Set"/>
      <connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>



I don't know how to proceed.

Thanks for all the help!

Best
Mathis

Best Answer

  • methusi
    methusi New Altair Community Member
    Answer ✓
    For the ones wondering - I could fix my problem by taking another route. Instead of calling a web service I schedule the process with the schedule API:
    POST to server/executions/schedule with the corresponding headers and body

    In the body, I do not set an execution time and force=true - this immediately starts the execution.

Answers

  • methusi
    methusi New Altair Community Member
    Answer ✓
    For the ones wondering - I could fix my problem by taking another route. Instead of calling a web service I schedule the process with the schedule API:
    POST to server/executions/schedule with the corresponding headers and body

    In the body, I do not set an execution time and force=true - this immediately starts the execution.
  • JEdward
    JEdward New Altair Community Member
    I suspect the issue might be that you have the extension that contains "Get Pages" installed on the AI-Hub JobAgent, but not on the Server itself. 

    If I recall the architecture diagram correctly, when you schedule a job or run it on the Server from Studio then it will execute on a JobAgent. 
    However, if it is run as a webservice then it doesn't run on a JobAgent, but on the Server itself. 

    Check

    [docker volumes path]/prod_rm-server-home-vol/_data/resources/extensions and see if you can spot it in there.  You can compare it against

    [docker volumes path]/prod_rm-server-ja-extensions and see if they match. 


Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.