High Availability Deployment - ActiveMQ

Nikouy
Nikouy New Altair Community Member
edited November 5 in Community Q&A
Hello,
I set up a cluster of Rapidminer Servers for High Availability and configured an Elastic File Storage System (EFS), where persistent-home is shared across all instances and have multiple Write-Read access.  All the servers are behind a Kubernetes sevice, which is connected to a Loadbalancer. This set up works perfectly with only one server instance.
However, when spinning up multiple RM Server instances, they are not are able to boot fully and I get 502 errors when trying to access the server service through the loadbalancer. In server.log, I see the following:
<div>INFO [stdout] (ServerService Thread Pool -- 56) 2020-04-04 08:42:19.453&nbsp; INFO 200 --- [read Pool -- 56] o.a.activemq.store.SharedFileLocker&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Database /rapidminer-home/data/broker/activemq/localhost/KahaDB/lock is locked by another server. This broker is now in slave mode waiting a lock to be acquired<br></div>
So I proceeded to set up an AmazonMQ broker (ActiveMQ PaaS), which I am able to connect via telnet from RM server (testing with only one instance for now). I modified "/persistent-rapidminer-home/configuration/execution.properties" with my ActiveMQ endpoint but RM Server is unable to connect (the embeddedBroker is disabled). I tried both, AMQP and OpenWirte protocols.
<div>2020-04-05 15:08:12,566 WARN&nbsp; [org.apache.activemq.transport.failover.FailoverTransport] (ActiveMQ Transport: tcp://b-zzzzzzz-1615-4d66-zzzzz-12be334477a4-1.mq.eu-west-1.amazonaws.com/192.168.86.70:5671@50426) Transport (tcp://b-zzzzzz-1615-4d66-zzzzzz-12be334477a4-1.mq.eu-west-1.amazonaws.com:5671) failed , attempting to automatically reconnect: java.io.EOFException</div>
Now, the questions:
1) Is the first log related to ActiveMQ conflicting when there are multiple servers running together? Would the solution be setting up an external broker that interconnects all servers and job agents?
2) I tried both, AMQP and OpenWire protocols. But as you can see, RM Server is unable to connect, although I can telnet into the endpoint. Any hints?
3) Is there anything else I need to change in Rapidminer Server?
Thanks in advance,
Nicolas





Tagged:

Answers

  • aschaferdiek
    aschaferdiek New Altair Community Member
    Hi Nicolas! You had the right idea here. I'll try to explain in more detail:
    1. Yes, this log line is related to the internal ActiveMQ broker which is spawned for each Server instance. All of those embedded brokers try to use the same data location of your shared EFS. Externalizing the ActiveMQ broker is a requirement for high availability mode. Please see the linked documentation below on how to set up the externalized broker and how to disable the embedded one.
    2. I guess the default ActiveMQ port is not exposed in your docker setup but for HA you should follow setting up an externalized one.
    3. Please see required steps in the HA documentation here: https://docs.rapidminer.com/latest/server/high-availability/overview.html
    Please notice that some features are disabled in HA.

    Best
    Alex

  • Nikouy
    Nikouy New Altair Community Member
    edited April 2020
    Thanks Alex. I wonder if  there is any way I can pass the following environment variables from the yaml file upon deployment or I need to manually modify/add these? If not, I'd appreciate if you can share the docker compose file so I can create a custom image for rapidminer server.
    jobservice.queue.activemq.embeddedBroker.enabled = false<br><code>rapidminer.server.isClustered = false<br>org.quartz.jobStore.isClustered = true
    org.quartz.jobStore.clusterCheckinInterval = 10000
    <code>jobservice.queue.activemq.uri= my broker<br>
    What does "rapidminer.server.isClustered" exactly do?
    The port is actually exposed, so I am not sure why RMS is unable to connect. Do you know if I need to create a specific queue in ActiveMQ first or it will work out of the box, as long as I give it the right endpoint and credentials? What's the protocol I need to use, AMQP, OpenWire or something else? Also, do I need to specifc the ActiveMQ name in Rapidminer?

    Thanks,
    Nicolas




  • aschaferdiek
    aschaferdiek New Altair Community Member
    edited April 2020
    In general those properties are read from their respective configuration files and personally I would recommend to modify the related files on the EFS.
    The isClustered property is a feature switch property which disables certain features which are not working in HA mode, please refer to the documentation which features are not supported then because they're not really suited for HA.
    Our broker is configured to use TCP with failover. You should use the same tested protocol. The URI should follow the failover:(tcp://host:port) scheme. How to adapt configuration of Server and the Job Agents (each one needs to have this new broker configuration) can be found in the installation guide (https://docs.rapidminer.com/latest/server/high-availability/installation.html).
    When properly configured, all queues are automatically created for you.

    Best,
    Alex

  • Nikouy
    Nikouy New Altair Community Member
    edited April 2020
    I was able to get the broker running using ssl and openwire protocol.  All my job agents and servers are able to connect.
    I have added included the parameters below in execution.properties, the server is reachable on the port I am exposing, and when I hit the public URL it takes me too /faces/login.xhtml. All good till here. Nonetheless, after submitting the login credentials, it bounces me back to the same page. After the 3rd of 4th attempt it lets me login and takes me to the main page. However, after clicking anywhere it bounces me back to the login page again.  Any idea what can it be?
    org.quartz.jobStore.isClustered = true<br>org.quartz.jobStore.clusterCheckinInterval = 10000
    rapidminer.server.isClustered = true<br>
    Thanks,
    Nicolas


    Edit:
    Sorted. I had to re-engineer my Loadbalancer and ingress controller as this was causing issues with stickysessions.
  • aschaferdiek
    aschaferdiek New Altair Community Member
    Nice that it worked out. :smile: