Processes not running on RM Server - stuck at

Drhelp
Drhelp New Altair Community Member
edited November 5 in Community Q&A
Processes are not starting on Rapidminer Server - not moving from a 'pending' state. Below are the first (most recent) few lines from the server log. I don't know what essentially is the problem though it seems to be time-related. How do I resolve this?  

00:07:28,583 WARN  [org.jboss.as.ejb3] (EJB default - 25) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,923 INFO  [stdout] (taskScheduler-1) [2m2019-11-25 00:07:28.869[0;39m [32m INFO[0;39m [35m9761[0;39m [2m---[0;39m [2m[taskScheduler-1][0;39m [36mc.r.e.jobservice.scheduled.AgentTasks   [0;39m [2m:[0;39m Flag Job Agent 'agent-bundled' (5fad92a9-1069-4070-b7f7-23708ce457d4) as 'TIMED_OUT' because it has not responded since '2019-11-24 23:35:21.71'
00:07:28,584 WARN  [org.jboss.as.ejb3] (EJB default - 3) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 28) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 22) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 15) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:29,169 WARN  [org.jboss.as.ejb3] (EJB default - 25) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:29 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 27) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 7) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 33) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:29,204 WARN  [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:29 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 43) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 48) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 9) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 38) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
00:07:28,585 WARN  [org.jboss.as.ejb3] (EJB default - 1) JBAS014143: A previous execution of timer [rapidminer-server-ear.rapidminer-server-ejb.PassiveTriggerExecutorEJBImpl df6da5cc-247a-465d-9508-63c838ad80a6] is still in progress, skipping this overlapping scheduled execution at: Mon Nov 25 00:07:28 GMT 2019 as timer state is IN_TIMEOUT
Tagged:

Best Answers

  • mmichel
    mmichel New Altair Community Member
    edited November 2019 Answer ✓
    The JA is still using the old Java path:
    Started process /bin/sh -c /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java

    Are you using the bundled job agent? If so make sure that you have removed the # in the standalone.conf file before setting the JAVA_HOME variable:

    #JAVA_HOME=""

    should be changed to (please adapt the path accordingly)

    JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_XXX.jdk/Contents/Home"


    If you are starting the JA via command line you can also set the JAVA_HOME before the start command:

    JAVA_HOME="yourPathHere" ./bin/rapidminer-jobagent

Answers

  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member
    Hi @Drhelp

    What's the Server version here? Also, could you give more details on how many job-agents you have configured on the server, and which of those is running into the pending state?
  • Drhelp
    Drhelp New Altair Community Member
    Hi Pavithra_Rao.

    Apologies I didn't include that info to start with. I'm using RM Server 9.5.1 and currently 1 job agent (see attached pix). Thank you.
  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member
    Hi @Drhelp ,

    Thanks for the additional details here. Could you please share the server.log file as well.
    Is this Server installed on Windows or Linux machines?

    Also, I would recommend raising a support ticket at support.rapidminer.com (if you have access to it) to get quicker response and resolution.

    Cheers,
    Pavithra
  • Drhelp
    Drhelp New Altair Community Member
    Hi Pavithra_Rao. Thanks for your response. The server.log file is attached. I don't think I have access to raising support tickets.
  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member
    Hi @Drhelp

    Looking at server.log, it doesn't indicate much about Job-agent TIME_OUT status.

    Could you also share the agent.log file as well?

    It would be located in <rapidminer-server-home>\job-agent-home\log\agent.log

    After sending the agent.log would recommend a stop and start the RapidMiner Server (if that's possible)

    I guess you have installed the Server on windows, Windows Service could be used to easily re-start the server.

    Cheers,
    Pavithra
  • Drhelp
    Drhelp New Altair Community Member
    Hi Pavithra_Rao. Attached is a copy of the agent.log file (in .txt format).

    I will stop and start the RapidMiner Server as advised. I installed it on mac, not windows. Thank you.
  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member
    Thank you, please keep us posted on how the re-start goes.
  • Drhelp
    Drhelp New Altair Community Member
    Still not working. Any ideas from the shared log files?
  • mmichel
    mmichel New Altair Community Member
    edited November 2019
    Hi Drhelp,
    the agent.log is indicating that the container timed out during start-up, because it did not reach the healthy state within 3 min:
    2019-11-25 18:39:43.115 ERROR 17226 — [taskScheduler-1] c.r.e.j.service.JobContainerService : Job container '1' startup timed out after 180000 ms.
    There could be plenty of reasons why this is happening:
    • Performance of the machine is not sufficient
    • Machine is busy doing other stuff in parallel
    • Extensions are slowing down the start - you may want to remove the ones you don't need
    In general, the container.log file should be checked here. It contains the startup procedure and might give some additional insights why the startup is not succeeding. If the timeout cannot be avoided, you have also the possibility to modify the timeout value and increase it maybe to 5 min via the agent.properties:
    jobagent.container.startupTimeout = 300000
    Hope this helps
    Marcel
  • Drhelp
    Drhelp New Altair Community Member
    Hi Marcel. Thank you for your response.

    Unfortunately, I can't find any container.log file. Where is it meant to be? I've searched everywhere on my mac and there's nothing like that. In addition, there is no "jobagent.container.startupTimeout = 300000" line in agent.properties.

    RM Server process still pending and not starting. I have removed most of the extensions from Studio, I don't get how the performance of my machine could be insufficient (using a MacBook Pro with 16GB RAM) or the bit about the machine doing other stuff in parallel (is that not one of purposes of the RM Server? But stopping other stuff and restarting hasn't helped). 

    I'd be grateful for further help and advice. Thank you.
  • mmichel
    mmichel New Altair Community Member
    edited November 2019
    Hi Drhelp,
    Unfortunately, I can't find any container.log file. Where is it meant to be? 
    the container.log file should be placed in the same directory as the agent.log file. So, $ja-home/logs/container-x.log. I think in your case the JC process never came up and as consequence, no log files have been created. This is also indicated by:
    2019-11-25 00:52:23.356  WARN 9760 --- [taskScheduler-1] c.r.e.j.service.JobContainerService      : Job container '1' cannot be killed, because the associated PID is undefined
    The startupTimeout property is not listed within the existing configuration template, but you can still add it to the agent.properties.
    I have removed most of the extensions from Studio, I don't get how the performance of my machine could be insufficient (using a MacBook Pro with 16GB RAM)
    It's not about the Studio extensions, it's about the extensions of Server or the JobAgent. But I guess in your case it seems to be a configuration issue. If you are running the the Job Agent on a 16GB machine you should also limit the memory amount accordingly. Currently it's configured with 65536MB, which is obviously larger than your capacity. So, please update the memory limit to something valid (agent.properties), for example:
    jobagent.container.memoryLimit = 4096
    Hope this helps,
    Marcel
  • Drhelp
    Drhelp New Altair Community Member
    Thanks, Marcel. I've changed the configuration of jobagent.container.memoryLimit to 4096MB as suggested, restarted RM Server and tried running another process with no joy. Is there anything else that could be the problem?
  • Drhelp
    Drhelp New Altair Community Member
    Hi Marcel. Thanks for all your help and efforts. I made the changes to JAVA_HOME as you suggested using JDK8 but it is still not working. Perhaps I should uninstall and start all over again? Any other possible cause of the problem from your technical point of view would be very helpful. Many thanks.   
  • mmichel
    mmichel New Altair Community Member
    Could you upload your agent.log file again?
  • Drhelp
    Drhelp New Altair Community Member
    Please see the agent.log file attached. Thank you.
  • mmichel
    mmichel New Altair Community Member
    edited November 2019 Answer ✓
    The JA is still using the old Java path:
    Started process /bin/sh -c /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java

    Are you using the bundled job agent? If so make sure that you have removed the # in the standalone.conf file before setting the JAVA_HOME variable:

    #JAVA_HOME=""

    should be changed to (please adapt the path accordingly)

    JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_XXX.jdk/Contents/Home"


    If you are starting the JA via command line you can also set the JAVA_HOME before the start command:

    JAVA_HOME="yourPathHere" ./bin/rapidminer-jobagent

  • Drhelp
    Drhelp New Altair Community Member
    It works now Marcel! Thank you very much indeed. You deserve a bonus!  ;)