Zero Downtime Docker Deployment with Tutum

In this post I will detail how to achieve (almost) zero downtime deployment of Docker containers with the Tutum Docker hosting service. Tutum is a service that really simplifies deploying with Docker, and it even has special facilities for enabling zero downtime deployment, i.e. the Tutum team has a version of HAProxy that can switch seamlessly between Docker containers in tandem with Tutum’s API. There is the caveat though, that at the time of writing, there is slight downtime involved with HAProxy’s switching of containers, typically a few seconds. I have been assured though, that this will be improved upon before Tutum goes into General Availability.

Basically, the officially sanctioned approach to implement zero downtime deployment on top of Tutum is so-called Blue-Green Deployment as detailed in this blogpost at Tutum. The concept underpinning blue-green deployment is pretty simple, the idea is to have a “blue” and a “green” version of your service, which are both behind a reverse proxy, which exposes exactly one at a time. When upgrading to a new version of your service, you deploy it to the container that is currently hidden, and make the reverse proxy switch to it. Then the new version will be publicly visible and the old version hidden.

Stack Definition

So how does one do this in practice? Basically you define a Tutum stack, with two app services, e.g. blue and green, and an HAProxy service to reside in front of the former two. The app services should have different deployment tags, so they deploy to different nodes. The HAProxy service needs to be linked to one of the two app services, i.e. the one that should initially be exposed. Later on, that link will be updated dynamically.

This is my current stack definition, wherein HAProxy is configured to enable HTTPS and redirect HTTP requests to HTTPS:

muzhack-green:
  image: quay.io/aknuds1/muzhack:omniscient
  tags:
    - muzhack-green
  environment:
    - FORCE_SSL=yes
  restart: always
  deployment_strategy: high_availability
muzhack-blue:
  image: quay.io/aknuds1/muzhack:omniscient
  tags:
    - muzhack-blue
  environment:
    - FORCE_SSL=yes
  restart: always
  deployment_strategy: high_availability
lb:
  image: tutum/haproxy
  tags:
    - muzhack-lb
  environment:
    - EXTRA_BIND_SETTINGS=redirect scheme https code 301 if !{ ssl_fc }
    - DEFAULT_SSL_CERT
    - HEALTH_CHECK=check inter 10s fall 1 rise 2
    - MODE=http
    - OPTION=redispatch, httplog, dontlognull, forwardfor
    - TCP_PORTS=80,443
    - TIMEOUT=connect 10s, client 1020s, server 1020s
    - VIRTUAL_HOST=https://*
  restart: always
  ports:
    - "443:443"
    - "80:80"
  links:
    - muzhack-green
  roles:
    - global

I’m not 100% certain yet about the HAProxy settings, which I’ve based off others’ advice. It appears to work however. The DEFAULT_SSL_CERT environment variable should contain the contents of an SSL .pem file.

Making the Switch

Implementing the deployment process itself caused me some bother initially, since it involves f.ex. knowing what is the currently active service (blue or green). I found I was able to script everything though, thankfully, through Tutum’s comprehensive API. In the end I wrote a Python script to deploy a new version to the currently inactive service, and make HAProxy switch to it:

#!/usr/bin/env python3
import argparse
import subprocess
import json
import sys


parser = argparse.ArgumentParser()
args = parser.parse_args()


def _info(msg):
    sys.stdout.write('* {}\n'.format(msg))
    sys.stdout.flush()


def _run_tutum(args):
    try:
        subprocess.check_call(['tutum',] + args, stdout=subprocess.PIPE)
    except subprocess.CalledProcessError as err:
        sys.stderr.write('{}\n'.format(err))
        sys.exit(1)


_info('Determining current production details...')
output = subprocess.check_output(['tutum', 'service', 'inspect', 'lb.muzhack-staging']).decode(
    'utf-8')
data = json.loads(output)
linked_service = data['linked_to_service'][0]['name']
_info('Currently linked service is \'{}\''.format(linked_service))

if linked_service == 'muzhack-green':
    link_to = 'muzhack-blue'
else:
    assert linked_service == 'muzhack-blue'
    link_to = 'muzhack-green'

_info('Redeploying service \'{}\'...'.format(link_to))
_run_tutum(['service', 'redeploy', '--sync', link_to,])

_info('Linking to service \'{}\'...'.format(link_to))
_run_tutum(['service', 'set', '--link-service', '{0}:{0}'.format(link_to),
    '--sync', 'lb.muzhack-staging',])
_info('Successfully switched production service to {}'.format(link_to))

This script does the following:

  1. Find out which service is currently active (green or blue)
  2. Redeploy inactive service so that it updates to the newest version
  3. Link HAProxy service to inactive service, thus making it the active one

Leave a Reply