In a previous post, I explained how to configure your Django project to run on Apache within a virtual Python environment. This time, I will talk about extending your project by integrating Celery, RabbitMQ, and Supervisor.
Note that although my individual use case involved Django, the process that I outline in this article applies equally well to just about any generic Python project.
Also bare in mind that, as of this writing, my operating system is Ubuntu 14.04. If you’re running a different flavor of Linux or your Ubuntu version is different, the commands and file paths might need tweaking.
Celery is Awesome! It is a Python package that makes the handling of asynchronous tasks a walk in the park. Whether you are interested in running a large scale distributed computation or you want to setup background jobs for your Django app, Celery is the way to go. It doesn’t take long to set up (see their docs) and once running, it can reliably handle just about any complex problem you throw at it. I can’t say enough about how cool it is!
Anyway. I will assume that you’ve already installed Celery alongside your Django project within some sort of virtual Python environment. See my previous post for more information about Django and virtualenv and see Celery’s Getting Started guide for information about using Celery with or without Django.
There’s not much more to say about it, other than that you will probably find it useful to have a celery user on your Linux box. Then you can grant the celery user access to particular root directories of importance. It’s convenient and, from a security point of view, better. Do something like the following:
sudo addgroup celerysudo adduser --system --no-create-home --shell /bin/sh --ingroup celery celerysudo mkdir /var/run/celerysudo chown celery:celery /var/run/celery
Celery requires a broker to handle message-passing. There are many to choose from and each has its benefits and drawbacks. I use RabbitMQ because it’s easy to setup and it is very well supported.
Very important: make sure that your RabbitMQ server is not severely outdated! In particular, if you’re running Ubuntu, compare the version installed by apt-get
with the latest version available. If the discrepancy is a big one, consider updating. I’ve experienced severe bugs with the lagging versions distributed from Ubuntu’s repositories.
Once you’ve got RabbitMQ installed, configure a celery user and a virtual server for your project:
sudo rabbitmqctl add_user celery celerysudo rabbitmqctl add_vhost my_project_vhostsudo rabbitmqctl set_permissions -p my_project_vhost celery ".*" ".*" ".*"
Essentially, this configures the “chatroom” where you can leave messages for Celery workers to pick up and execute. In your Django settings file, your broker URL would then look something like:
BROKER_URL = 'amqp://celery:celery@localhost:5672/my_project_vhost'
Your new Celery/ RabbitMQ setup can easily be tested with Django. Simply navigate to your Django project directory, activate your virtual environment, and run:
celery -A myapp worker -l info
This command temporarily creates a Celery worker to run any tasks defined in your Django app (replace myapp
with the name of your main app).
Ultimately, we want Celery to run in the background. By far the easiest method for this is to use Supervisor, a daemon program that manages an entire pool of workers and recurring processes.
First, install Supervisor. It is important to note that Supervisor must be installed alongside Python 2 — it does not work with Python 3. But fear not: your Celery subprocesses can run in whichever Python version you like!
Once installed, manually create a system directory structure for Supervisor:
sudo mkdir /etc/supervisord /etc/supervisord/conf.d /var/log/supervisord
Then, create the main configuration file. The best way to do this is to use Supervisor’s echo_supervisord_conf
script to auto-generate the configuration, which you can then edit manually.
sudo echo_supervisord_conf > /etc/supervisord/supervisord.confsudo chmod 644 /etc/supervisord/supervisord.conf
The auto-generated configuration is quite a large file with many commented lines. When all of the comments are taken out, it should look something like this (edited):
[unix_http_server]file=/tmp/supervisor.sock[supervisord]logfile=/var/log/supervisord/main.logchildlogdir=/var/log/supervisordlogfile_maxbytes=50MBlogfile_backups=10loglevel=infopidfile=/tmp/supervisord.pidnodaemon=falseminfds=1024minprocs=200[rpcinterface:supervisor]supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface[supervisorctl]serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket[include]files = /etc/supervisord/conf.d/*.conf
The important things to note are that the log paths are pointing to newly created /var/log/supervisord/ directory and that we’re including secondary configuration files from the /etc/supervisord/conf.d/ directory.
Now, let’s configure our Celery/ Django project. Create a new configuration file, /etc/supervisord/conf.d/myproject.conf, which will be automatically included every time Supervisor starts. It will cause Supervisor to create three subprocesses: two Celery workers and a Celery Beat (periodic tasks). I assume that your project is called myproject, that it is located at /home/myuser/myproject/, that it incorporates a virtual environment /home/myuser/myproject/.venv/, and that your main Django app is called myapp.
Copy and edit the following configuration:
[program:myprojectbeat]command=/home/myuser/myproject/.venv/bin/celery -A myapp beat -l info -s /tmp/myprojectbeat-schedule --pidfile=/tmp/myprojectbeat.piddirectory=/home/myuser/myproject/autostart=trueautorestart=trueuser=celerystartsecs=10stopwaitsecs=300[program:myproject]numprocs=2process_name=worker%(process_num)scommand=/home/myuser/myproject/.venv/bin/celery -A myapp worker -l info -n worker%(process_num)s.%%h --pidfile=/tmp/myproject-worker%(process_num)s.piddirectory=/home/myuser/myproject/autostart=trueautorestart=trueuser=celerystartsecs=10stopwaitsecs=300
And ensure it has the proper permissions with:
sudo chmod 644 /etc/supervisord/conf.d/*
Supervisor should now be ready to run! Test the new configuration by trying to start and stop supervisord manually:
sudo supervisord -c /etc/supervisord/supervisord.confsudo unlink /tmp/supervisor.sock
Assuming that supervisord can be started successfully, you can manage the sub-processes via supervisorctl:
sudo supervisorctl -c /etc/supervisord/supervisord.conf
We want supervisord to start automatically at system boot, so it will be necessary to integrate it with Ubuntu as a system service. Start by taking a look at the init script templates developed as part of the Supervisor project. Copy the one named ubuntu and save it to a new system file called /etc/init.d/supervisord.
The problem here is that the file copied is just a template — you’ll need to modify it to match your specific setup. Open it up and review the code. Most of the important things needing modification are at the top. For my particular setup, here is how I decided to modify the top portion of the file:
. /lib/lsb/init-functionsPATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/binDAEMON=/usr/local/bin/supervisordSUPERVISORCTL=/usr/local/bin/supervisorctlCONF_FILE=/etc/supervisord/supervisord.confNAME=supervisordDESC=supervisortest -x $DAEMON || exit 0LOGDIR=/var/log/supervisorPIDFILE=/tmp/supervisord.pidDODTIME=5if [ -f /etc/default/supervisor ] ; then. /etc/default/supervisorfiDAEMON_OPTS="-c $CONF_FILE $DAEMON_OPTS"
It is just an example intended to give you a leg up. Notice how I’ve added a new setting called CONF_FILE
pointing directly to the main configuration file I want to use.
Once the init script reflects your system, you should install it as an Ubuntu service:
sudo chmod 755 /etc/init.d/supervisordsudo update-rc.d supervisord defaultssudo service --status-all
For reference, system services can be removed with:
sudo update-rc.d -f supervisord remove
That’s all. Cheers!
GET IN TOUCH
Have a project in mind?
Reach out directly to hello@humaticlabs.com or use the contact form.