RabbitMQ as we all know has been around for long and is being used in more than 40% applications worldwide for Message Queuing which implements Advanced Message Queuing Protocol (AMQP). Based on Erlang Programming Language it handles clustering and failover very smoothly. RabbitMQ is used to queue messages sent between applications. For example, I could queue some messages in RabbitMQ from one part of the application and put some workers for another application below it which would process those messages as an when required and get the task done. You can read more about RabbitMQ here.
Setting up RabbitMQ cluster and configuring failover. Clustering is inbuilt into RabbitMQ and for failover we will use AWS Route 53 and AWS CloudWatch. I am assuming you already have a RabbitMQ server setup on an AWS EC2 instance or any other platform. We will be creating a new RabbitMQ node and then creating a cluster. At Haptik, we believe in more reliable architectural components and hence we decided to implement this. This is not Load-Balancing.
- RabbitMQ server up and running. You can follow this if you don’t already have that:
- A new node/server could be an AWS EC2, Digital Ocean droplet or any other platform.
- Follow the steps in the above link to install RabbitMQ server on the new node as well. Remember that versions should be same on this server and RabbitMQ Management plugin should also be enabled for UI. It can be enabled following the steps here. You can check it running on port 15672. You can also just bring up the new server using an AMI of the master server.
Steps To Follow To Setup The Cluster
Steps on the RabbitMQ server 1:
- SSH into your RabbitMQ server 1. Let’s call it rabbitmqnode1.
- Set hostname using following commands:
12127.0.0.1 rabbitmqnode1"Public/Private IP of your new RabbitMQ server" rabbitmqnode2
- View the content in the Erlang cookie file which can be found at /var/lib/rabbitmq/ using
It should show content as follows:
Just copy the text and keep it handy.
Steps on the RabbitMQ server 2 (New Server):
- Stop RabbitMQ server using the command:
1service rabbitmq-server stop
- Now, go ahead and replace the erlang cookie on this server. Remember the erlang cookie we previously copied from server 1?
- Now, run the following commands one by one:
123service rabbitmq-server startrabbitmq-plugins enable rabbitmq_managementsudo rabbitmqctl join_cluster rabbit@rabbitmqnode1
1rabbitmqctl join_cluster --ram rabbit@rabbitmqnode1
- Run the following command to see the cluster status now and you should see all the nodes in the cluster and their mode, whether RAM or DISK:
You’re done. In the management console of all the nodes on the browser, you will see same graphs, messages, queues. This is an active-active cluster that we have setup. Below is an example of how the nodes would look like (I have added a third one as well which is in RAM mode):
Steps To Setup Failover
For failover, I am using a simple AWS Route53 Health check, which makes a DNS switch to one of the secondary servers if the primary server fails to respond. Once the above steps are performed, you can just make a simple script to push RabbitMQ master’s status to AWS CloudWatch. There can be infinite ways to do so, I am using a simple telnet:
rm -rf /usr/local/nagios/etc/objects/scripts/myoutputfile
telnet rabbit.xxxxxxxstagingtestwebsite.com 5672 | tee -a /usr/local/nagios/etc/objects/scripts/myoutputfile
cat /usr/local/nagios/etc/objects/scripts/myoutputfile | grep "Connected"
if [ $? -eq 0 ]; then
/usr/local/bin/aws cloudwatch put-metric-data --metric-name "RabbitMQ staging UP/DOWN" --unit Count --value $count --dimensions InstanceId=$INST_ID --namespace Rabbitmq
So, if the it’s connected, I push count 1 to CloudWatch else I push 0 which means the host is down.
Just make a relevant AWS CloudWatch alarm which is at alarm state when the count is 0. We will use the same alarm in AWS Route 53 health check.
Just go to AWS Route 53 and setup a health check from the menu from the left hand side:
The above shall open a page where you need to enter some details. I will keep it brief here. I have chosen based on State of CloudWatch alarm because I have used the above script to note the status of my RabbitMQ server:
If I choose an endpoint, it should be publicly accessible, which is not true in my case. My RabbitMQ server is not publicly accessible. Now, choosing that gives me options as below:
Once the above is done, i.e. the health check is configured, just create a DNS failover in AWS Route53 based on the health check we just made.
(Assuming you know how to do that, else you can follow the steps here: Route 53 DNS Failover)
Hope this helps you setup a working RabbitMQ cluster. I will soon be coming up with more exciting blogs. Until then keep following Haptik Tech Blog.