Master background tasks with Celery, RabbitMQ, and Supervisor

Robert Taylor October 31, 2015

Preface

In a previous post, I explained how to configure your Django project to run on Apache within a virtual Python environment. This time, I will talk about extending your project by integrating Celery, RabbitMQ, and Supervisor.

Note that although my individual use case involved Django, the process that I outline in this article applies equally well to just about any generic Python project.

Also bare in mind that, as of this writing, my operating system is Ubuntu 14.04. If you’re running a different flavor of Linux or your Ubuntu version is different, the commands and file paths might need tweaking.

Celery

Celery is Awesome! It is a Python package that makes the handling of asynchronous tasks a walk in the park. Whether you are interested in running a large scale distributed computation or you want to setup background jobs for your Django app, Celery is the way to go. It doesn’t take long to set up (see their docs) and once running, it can reliably handle just about any complex problem you throw at it. I can’t say enough about how cool it is!

Anyway. I will assume that you’ve already installed Celery alongside your Django project within some sort of virtual Python environment. See my previous post for more information about Django and virtualenv and see Celery’s Getting Started guide for information about using Celery with or without Django.

There’s not much more to say about it, other than that you will probably find it useful to have a celery user on your Linux box. Then you can grant the celery user access to particular root directories of importance. It’s convenient and, from a security point of view, better. Do something like the following:

sudo addgroup celery
sudo adduser --system --no-create-home --shell /bin/sh --ingroup celery celery
sudo mkdir /var/run/celery
sudo chown celery:celery /var/run/celery

RabbitMQ

Celery requires a broker to handle message-passing. There are many to choose from and each has its benefits and drawbacks. I use RabbitMQ because it’s easy to setup and it is very well supported.

Very important: make sure that your RabbitMQ server is not severely outdated! In particular, if you’re running Ubuntu, compare the version installed by apt-get with the latest version available. If the discrepancy is a big one, consider updating. I’ve experienced severe bugs with the lagging versions distributed from Ubuntu’s repositories.

Once you’ve got RabbitMQ installed, configure a celery user and a virtual server for your project:

sudo rabbitmqctl add_user celery celery
sudo rabbitmqctl add_vhost my_project_vhost
sudo rabbitmqctl set_permissions -p my_project_vhost celery ".*" ".*" ".*"

Essentially, this configures the “chatroom” where you can leave messages for Celery workers to pick up and execute. In your Django settings file, your broker URL would then look something like:

BROKER_URL = 'amqp://celery:celery@localhost:5672/my_project_vhost'

Your new Celery/ RabbitMQ setup can easily be tested with Django. Simply navigate to your Django project directory, activate your virtual environment, and run:

celery -A myapp worker -l info

This command temporarily creates a Celery worker to run any tasks defined in your Django app (replace myapp with the name of your main app).

Supervisor

Ultimately, we want Celery to run in the background. By far the easiest method for this is to use Supervisor, a daemon program that manages an entire pool of workers and recurring processes.

First, install Supervisor. It is important to note that Supervisor must be installed alongside Python 2 — it does not work with Python 3. But fear not: your Celery subprocesses can run in whichever Python version you like!

Once installed, manually create a system directory structure for Supervisor:

sudo mkdir /etc/supervisord /etc/supervisord/conf.d /var/log/supervisord

Then, create the main configuration file. The best way to do this is to use Supervisor’s echo_supervisord_conf script to auto-generate the configuration, which you can then edit manually.

sudo echo_supervisord_conf > /etc/supervisord/supervisord.conf
sudo chmod 644 /etc/supervisord/supervisord.conf

The auto-generated configuration is quite a large file with many commented lines. When all of the comments are taken out, it should look something like this (edited):

[unix_http_server]
file=/tmp/supervisor.sock
[supervisord]
logfile=/var/log/supervisord/main.log
childlogdir=/var/log/supervisord
logfile_maxbytes=50MB
logfile_backups=10
loglevel=info
pidfile=/tmp/supervisord.pid
nodaemon=false
minfds=1024
minprocs=200
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[include]
files = /etc/supervisord/conf.d/*.conf

The important things to note are that the log paths are pointing to newly created /var/log/supervisord/ directory and that we’re including secondary configuration files from the /etc/supervisord/conf.d/ directory.

Now, let’s configure our Celery/ Django project. Create a new configuration file, /etc/supervisord/conf.d/myproject.conf, which will be automatically included every time Supervisor starts. It will cause Supervisor to create three subprocesses: two Celery workers and a Celery Beat (periodic tasks). I assume that your project is called myproject, that it is located at /home/myuser/myproject/, that it incorporates a virtual environment /home/myuser/myproject/.venv/, and that your main Django app is called myapp.

Copy and edit the following configuration:

[program:myprojectbeat]
command=/home/myuser/myproject/.venv/bin/celery -A myapp beat -l info -s /tmp/myprojectbeat-schedule --pidfile=/tmp/myprojectbeat.pid
directory=/home/myuser/myproject/
autostart=true
autorestart=true
user=celery
startsecs=10
stopwaitsecs=300
[program:myproject]
numprocs=2
process_name=worker%(process_num)s
command=/home/myuser/myproject/.venv/bin/celery -A myapp worker -l info -n worker%(process_num)s.%%h --pidfile=/tmp/myproject-worker%(process_num)s.pid
directory=/home/myuser/myproject/
autostart=true
autorestart=true
user=celery
startsecs=10
stopwaitsecs=300

And ensure it has the proper permissions with:

sudo chmod 644 /etc/supervisord/conf.d/*

Supervisor should now be ready to run! Test the new configuration by trying to start and stop supervisord manually:

sudo supervisord -c /etc/supervisord/supervisord.conf
sudo unlink /tmp/supervisor.sock

Assuming that supervisord can be started successfully, you can manage the sub-processes via supervisorctl:

sudo supervisorctl -c /etc/supervisord/supervisord.conf

Ubuntu init.d Script

We want supervisord to start automatically at system boot, so it will be necessary to integrate it with Ubuntu as a system service. Start by taking a look at the init script templates developed as part of the Supervisor project. Copy the one named ubuntu and save it to a new system file called /etc/init.d/supervisord.

The problem here is that the file copied is just a template — you’ll need to modify it to match your specific setup. Open it up and review the code. Most of the important things needing modification are at the top. For my particular setup, here is how I decided to modify the top portion of the file:

. /lib/lsb/init-functions
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/local/bin/supervisord
SUPERVISORCTL=/usr/local/bin/supervisorctl
CONF_FILE=/etc/supervisord/supervisord.conf
NAME=supervisord
DESC=supervisor
test -x $DAEMON || exit 0
LOGDIR=/var/log/supervisor
PIDFILE=/tmp/supervisord.pid
DODTIME=5
if [ -f /etc/default/supervisor ] ; then
. /etc/default/supervisor
fi
DAEMON_OPTS="-c $CONF_FILE $DAEMON_OPTS"

It is just an example intended to give you a leg up. Notice how I’ve added a new setting called CONF_FILE pointing directly to the main configuration file I want to use.

Once the init script reflects your system, you should install it as an Ubuntu service:

sudo chmod 755 /etc/init.d/supervisord
sudo update-rc.d supervisord defaults
sudo service --status-all

For reference, system services can be removed with:

sudo update-rc.d -f supervisord remove

That’s all. Cheers!


GET IN TOUCH

Have a project in mind?

Reach out directly to hello@humaticlabs.com or use the contact form.

HUMATIC LABS LLC

All rights reserved