Follow Conventions to Build Infrastructure

Hamza Sheikh

2016-08-30 05:00

Any sufficiently large software system depends on a lot of third-party artifacts, from operating systems to libraries to pre-built packages and everything else in between. For example, to deploy a single web server running a Django application requires a developer to pull in a Linux OS, Python, Django, Gunicorn, nginx, etc. Whether this complete package is deployed on a VM under your control or your customers', you want to follow conventions in the entire lifecycle of the application.

Let's start with the OS and build on that. Say CentOS becomes the OS of choice. You choose its minimal version for development, testing, and production deployment. Some packages needed are not provided in the base repository (repo) and you turn to additional repos like EPEL and IUS.

How do you make sure you can reliably stand up a development environment? A popular choice these days is to use Vagrant. I've been using it for a while as well and like it. It has its own conventions to follow, though. You can build a CentOS box yourself but community convention is to pull in official boxes if you don't need to heavily customize the minimal install. You certainly don't want to spend your time building Vagrant boxes when the community has already done the hard work for you.

Next you may want to use Docker to easily package and deploy your artifacts. Here again you should follow convention of a single process per container. Since the convention is widely accepted, trying to treat a container as a mini Virtual Machine (VM) goes against it. You'll find far better help through blogs, forums, etc. if you stick to conventions here.

You now have a good way to stand up a development environment for yourself and others. You used what the Vagrant community built and you followed Docker conventions to make your team's and your life easier.

Next step is packaging your application regularly to be deployed to other members of your team, to a continuous integration (CI) environment, and maybe to a continuous deployment (CD) environment. How do you package your application? Once again you follow conventions. In our example it's a Django application so you do what the wider community has accepted as its convention. Thus your package easily fits into the ecosystem you have chosen for your development process: Vagrant, Docker, CI, CD.

Your CI process takes deploys a Vagrant box, packages your application as a Docker image, deploys it as a container, runs tests on it, and then deploys to a CD environment. In this entire process when you follow conventions in each step you'll likely encounter fewer bugs and other members of the team will find it easier to understand and contribute.

I have recently asked myself a few questions on when to deviate from conventions and how much. Let's take nginx as an example. Should I build a custom rpm package for my uses or use what us provided in EPEL? To reduce my burden of maintaining a package's lifecycle I use the one provided by EPEL. But what if it gets upgraded in EPEL? Do I have enough control to manage when packages get upgraded in my environment? I mitigate that risk by mirroring EPEL within my own network and selectively releasing packages to it. This is a widely used convention but introduces overhead. My team needs to add more duties because we want to control our risk.

The same thing happens with Docker images. I should add a private repo for it from where I pull my images. I release newer upstream images to this repo when I'm ready for it. This is also a well adopted convention.

The problem comes when I have to customize the nginx package for deployment. Enter configuration management. I use something, say SaltStack, to configure nginx in development, testing, and production. I take the upstream package, install it in a Docker image, and use Salt to customize the container when it's deployed. Thus the configuration "truth" stays in one place: Salt.

Do not add customized configuration to a cloned nginx rpm package. Do not create dozens of Docker images with customized configuration in each. Use a configuration management tool to do the thing it's made to do. This is the conventional use of modern tools. Of course, if you use your config management tool to create config packages that would be great as well.

I have witnessed Dockerfiles that create VM-like images with multiple processes and all configuration done right there. This complicates life when deployment does not match assumptions made at image build time. Keeping things simple and following conventions reduces your chances of making such mistakes.

In your CI system as you package your application, think of the various artifacts that are useful when deploying it. Start with a base VM. Since we're running CentOS our artifact must be rpm. How difficult is it to also create a deb package so it can install on Ubuntu? In the grand scheme of things: not that difficult. Do not create a Docker image artifact in the same build. Instead, kick off a secondary build that creates two Docker images: one using a base CentOS image that installs your application from an rpm package and the other using an Ubuntu image that installs your application from a deb package.

Next deploy all these artifacts to appropriate environments, configure them with Salt, and run tests.

List of "Do Nots":

Do not install and run multiple services in the same Docker container.
Do not replicate the job of packages in Dockerfile. For example, creating users, copying files, setting permissions, etc.
Do not create SysV init service files.
Do not add configuration steps in a Dockerfile.

List of "Dos":

Install and run one service per Docker container.
Create users, copy files to right locations, set file permissions, etc. within the rpm or deb package.
Create systemd unit files.
Be ready to deploy your application on a single server bare metal VM, multiple VMs, multiple Docker containers.
Configure a Docker image during deployment using a configuration management system. Or install config packages (rpm, deb, etc.).

List of "You May":

You may take an upstream vanilla source rpm or deb package and break it up into multiple packages: application source, config files, systemd unit files. Then create customized packages according to the needs of deployment. This is helpful in industries where each install is manual, can't be touched for months on end, has no Internet connectivity. Each deployment thus shares the same application and unit files packages but has its own config file package.