The Problem

You want to use Ansible’s docker_container module to do stuff, but want to also perform actions based on their output without specifying a logging driver or writing to a temp file.

The Solution

Do this:

– name: Run a Docker container
image: alpine
entrypoint: sh
command: -c echo "Hello, from Docker"'!'
detach: false
register: container_output

– name: Get its output
msg: "Docker said: {{ container_output.ansible_facts.docker_container.Output }}"

WTF did you just do?

The key takeaways:

  • docker_container runs its containers in detached mode by default. We turned this off by specifying detach: false

  • The container’s metadata (i.e. the stuff you get from docker output are exposed as Ansible facts, which are captured by registering the play first. container_output.ansible_facts.docker_container.Output captures the fact that contains our stdout.



A Few Gotchas About Going Multi-Cloud with AWS, Microsoft Azure and HashiCorp tools.

One of the more interesting types of work we do at Contino is help our clients make sense of the differences between AWS and Microsoft Azure. While the HashiCorp toolchain (Packer, Terraform, Vault, Vagrant, Consul and Nomad) have made provisioning infrastructure a breeze compared to writing hundreds of lines of Python, they almost make achieving a multi-cloud infrastructure deployment seem too easy.

This post will outline some of the differences I’ve observed with using these tools against both cloud platforms. As well, since I used the word “multi-cloud” in my first paragraph, I’ll briefly discuss some general talking points on “things to consider” before embarking on a multi-cloud journey at the end.

Azure and ARM Are Inseparable

One of the core features that make Terraform and Packer tick are providers and builders, respectively. These allow third-parties to write their own “glue” code that tells Terraform how to create VMs or Packer how to create machine images. This way, Terraform and Packer simply become “thin-clients” for your desired platform. HashiCorp’s recent move of moving provider code out of the Terraform binary in version 0.10 emphasizes this.

Alas, when you create VMs with Terraform or machine images with Packer, you’re really asking the AWS Golang SDK to do those things. This is mostly the case with Azure, with one big exception: the Azure Resource Manager, or ARM.

ARM is more-or-less like AWS CloudFormation. You create a JSON template of the resources that you’d like to deploy into a single resource group along with the relationships that should exist between those resources and submit that into ARM as a deployment. It’s pretty nifty stuff.

However, instead of Terraform or Packer using the Azure Go SDK directly to create these resources, they both rely on ARM through the Azure Go SDK to do that job for them. I’m guessing that HashiCorp chose to do it this way to avoid rework (i.e. “why create a resource object in our provider or builder when ARM already does most of that work?”) While this doesn’t have too many implications in how you actually use these tools against Azure, there are some notable differences in what happens at runtime.

Azure Deployments Are Slower

My experience has shown me that the Azure ARM Terraform provider and Packer builder takes slightly more time to “get going” than the AWS provider does, especially when using Standard_A class VMs. This can make testing code changes quite tedious.

Consider the template below. This uses a t2.micro instance to provision a Red Hat image with no customizations.

"description": "Basic RHEL image.",
"variables": {
"access_key": null,
"secret_key": null
"builders": [
"type": "amazon-ebs",
"access_key": "{{ user `access_key` }}",
"secret_key": "{{ user `secret_key` }}",
"region": "us-east-1",
"instance_type": "t2.micro",
"source_ami": "ami-c998b6b2",
"ami_name": "test_ami",
"ssh_username": "ec2-user",
"vpc_id": "vpc-8a2dbbf2",
"subnet_id": "subnet-306b673c"
"provisioners": [
"type": "shell",
"inline": [
"#This is required to allow us to use `sudo` from our Packer provisioner.",
"#This is enabled by default on all RHEL images for \"security.\"",
"sudo sed -i.bak -e '/Defaults.*requiretty/s/^/#/' /etc/sudoers"
"type": "shell",
"inline": ["echo Hey there"]

Assuming a fast internet connection (I did this test with a ~6 Mbit connection), it doesn’t take too much time for Packer to generate an AMI for us.

$ time packer build -var 'access_key=REDACTED' -var 'secret_key=REDACTED' aws.json
==> amazon-ebs: Creating temporary security group for this instance: packer_5a136414-1ba5-7c7d-890c-697a8563d4be
==> amazon-ebs: Authorizing access to port 22 from in the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
amazon-ebs: Hey there
==> amazon-ebs: Stopping the source instance...
amazon-ebs: Stopping instance, attempt 1
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating the AMI: test_ami
amazon-ebs: AMI: ami-20ff765a
Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
us-east-1: ami-20ff765a

real 1m50.900s
user 0m0.020s
sys 0m0.008s

Let’s repeat this exercise with Azure. Here’s that template again, but Azure-ified:

"description": "Basic RHEL image.",
"variables": {
"client_id": null,
"client_secret": null,
"subscription_id": null,
"azure_location": null,
"azure_resource_group_name": null
"builders": [
"type": "azure-arm",
"communicator": "ssh",
"ssh_pty": true,
"managed_image_name": "rhel-{{ user `base_rhel_version` }}-rabbitmq-x86_64",
"managed_image_resource_group_name": "{{ user `azure_resource_group_name` }}",
"os_type": "Linux",
"vm_size": "Standard_B1",
"client_id": "{{ user `client_id` }}",
"client_secret": "{{ user `client_secret` }}",
"subscription_id": "{{ user `subscription_id` }}",
"location": "{{ user `azure_location` }}",
"image_publisher": "RedHat",
"image_offer": "RHEL",
"image_sku": "7.3",
"image_version": "latest"
"provisioners": [
"type": "shell",
"inline": [
"#This is required to allow us to use `sudo` from our Packer provisioner.",
"#This is enabled by default on all RHEL images for \"security.\"",
"sudo sed -i.bak -e '/Defaults.*requiretty/s/^/#/' /etc/sudoers"
"type": "shell",
"inline": ["echo Hey there"]

And here’s us running this Packer build. I decided to use a Basic_A0 instance size, as that is the closest thing that Azure has to a t2.micro instance that was available for my subscription. (The Standard_B series is what I originally intended to use, as, like the t2 line, those are burstable.)

Notice that it takes almost TEN times as long with the same Linux distribution and similar instance sizes!

$ packer build -var 'client_id=REDACTED' -var 'client_secret=REDACTED' -var 'subscription_id=REDACTED' -var 'tenant_id=REDACTED' -var 'resource_group=REDACTED' -var 'location=East US' azure.json
azure-arm output will be in this color.

==> azure-arm: Running builder ...
azure-arm: Creating Azure Resource Manager (ARM) client ...
==> azure-arm: Creating resource group ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> Location : 'East US'
azure-arm: Hey there
==> azure-arm: Querying the machine's properties ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> ComputeName : 'pkrvms6sj74tdvk'
==> azure-arm: -> Managed OS Disk : '/subscriptions/8bbbc92b-6d16-4eb2-8f95-7a0769748c8d/resourceGroups/packer-Resource-Group-s6sj74tdvk/providers/Microsoft.Compute/disks/osdisk'
==> azure-arm: Powering off machine ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> ComputeName : 'pkrvms6sj74tdvk'
==> azure-arm: Capturing image ...
==> azure-arm: -> Compute ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> Compute Name : 'pkrvms6sj74tdvk'
==> azure-arm: -> Compute Location : 'East US'
==> azure-arm: -> Image ResourceGroupName : 'REDACTED'
==> azure-arm: -> Image Name : 'IMAGE_NAME'
==> azure-arm: -> Image Location : 'eastus'
<strong>==> azure-arm: Deleting resource group ...</strong>
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: Deleting the temporary OS disk ...
==> azure-arm: -> OS Disk : skipping, managed disk was used...
Build 'azure-arm' finished.

==> Builds finished. The artifacts of successful builds are:
--> azure-arm: Azure.ResourceManagement.VMImage:

ManagedImageResourceGroupName: REDACTED
ManagedImageName: IMAGE_NAME
ManagedImageLocation: eastus

<strong>real 10m27.036s
user 0m0.056s
sys 0m0.020s</strong>

The worst part about this is that it takes this long even when it fails!

Notice the “Deleting resource group…” line I highlighted. You’ll likely spend a lot of time looking at that line. For some reason, cleanup after an ARM deployment can take a while. I’m guessing that this is due to three things:

  1. Azure creating intermediate resources, such as virtual networks (VNets), subnets and compute, all of which can take time,
  2. ARM waiting for downstream SDKs to finish deleting resources and/or any associated metadata, and
  3. Packer issuing asynchronous operations to the Azure ARM service, which requires polling the operationResult endpoint every so often to see how things played out.

Pro-Tip: Use the az Python CLI before running things!

As recovering from Packer failures can be quite time-consuming, you might want to consider leveraging the Azure command-line clients to ensure that inputs into Packer templates are correct. Here’s quick example: if you want to confirm that the service principal client_id and client_secret are correct, you might want to add something like this into your pipeline:

#!/usr/bin/env bash

if ! az login --service-principal -u "$client_id" -p "$client_secret" --tenant "$tenant_id"
echo "ERROR: Invalid credentials." >&2
exit 1

This will save you at least three minutes during exection…as well as something else that’s a little more frustrating.

The AWS provider and builder are more actively consumed

Both the AWS and Azure Terraform providers and Packer builders are mostly maintained internally by HashiCorp. However, what you’ll find out after using the Azure ARM provider for a short while is that its usage within the community pales in comparison.

I ran into an issue with the azure-arm builder whereby it failed to find a resource group that I created for an image I was trying to build. Locating that resource group with az groups list and the same client_id and secret worked fine, and I was able to find the resource group in the console. As well, I gave the service principal “Owner” permission, so there were no access limitations preventing it from finding this resource group.

After spending some time going into the builder source code and firing up Charles Web Proxy, it turned out that my error had nothing to do with resource groups! It turns out that the credentials I was passing into Packer from my Makefile were incorrect.

What was more frustrating is that I couldn’t find anything on the web about this problem. One would think that someone else using this builder would have discovered this before I did, especially after this builder having been available for at least 6 months since this time of writing.

It also seems that there are, by far, more internal commits and contributors to the Amazon builders than those for Azure, which seem to largely be maintained by Microsoft folks. Despite this disparity, the Azure contributors are quite fast and are very responsive (or at least they were to me!).

Getting Started Is Slightly More Involved on Azure

In the early days of cloud computing, Amazon’s EC2 service focused entirely on VMs. Their MVP at the time was: we’ll make creating, maintaining and destroying VMs fast, easy and painless. Aside from subnets and some routing details, much of the networking overhead was abstracted away. Most of the self-service offerings that Amazon currently has weren’t around, or at least not yet. Deploying an app onto AWS still required knowledge on how to set up EC2 instances and deploy onto them, which allowed companies like Digital Ocean and Heroku to rise into prominence. Over time, this premise seems to have held up, as most of AWS’s other offerings heavily revolve around EC2 in various forms.

Microsoft took the opposite direction with Azure. Azure’s mission statement was to deploy apps onto the cloud as quickly as possible without users having to worry about the details. This is still largely the case, especially if one is deploying an application from Visual Studio. Infrastructure-as-a-Service was more-or-less an afterthought, which led to some industry confusion over where Azure “fit” in the cloud computing spectrum. Consequently, while Microsoft added and expanded their infrastructure offerings over time, the abstractions that were long taken for granted in AWS haven’t been “ported over” as quickly.

This is most evident when one is just getting started with AWS and the HashiCorp suite for the first time versus starting up on Azure. These are the steps that one needs to take in order to get a working Packer image into AWS:

  1. Sign up for AWS.
  2. Log into AWS.
  3. Go to IAM and create a new user.
  4. Download the access and secret keys that Amazon gives you.
  5. Assign that user Admin privileges over all AWS services.
  6. Download the AWS CLI (or install Docker and use the anigeo/awscli image)
  7. Configure your client: aws configure
  8. Create a VPC: aws ec2 create-vpc --cidr-block
  9. Create an Internet Gateway: aws ec2 create-internet-gateway
  10. Attach the gateway to your VPC so that your machines can Internet: aws ec2 attach-internet-gateway --internet-gateway-id $id_from_step_9 --vpc-id $vpc_id_from_step_8
  11. Create a subnet: aws ec2 create-subnet --vpc-id $vpc_id_from_step_8 --cidr-block
  12. Update that subnet so that it can issue publicly accessible IP addresses to VMs created within it: aws ec2 modify-subnet-attribute --subnet-id $subnet_id_from_step_11 --map-public-ip-on-launch
  13. Download Packer (or use the hashicorp/packer Docker image)
  14. Create a Packer template for Amazon EBS.
  15. Deploy! `packer build -var ‘access_key=$access_key’ -var ‘secret_key=$secret_key’ your_template.json

If you want to understand why an AWS VPC requires an internet gateway or how IAM works, finding whitepapers on these topics is a fairly straightforward Google search.

Getting started on Azure, on the other hand, is slightly more laborious as documented here. Finding in-depth answers about Azure primitives has also been slightly more difficult, in my experience. Most of what’s available are Microsoft Docs entries about how to do certain things and non-technical whitepapers. Finding a Developer Guide like those available in AWS was difficult.

In Conclusion

Using multiple cloud providers is a smart way of leveraging different pricing schemes between two providers. It is also an interesting way of adding more DR than a single cloud provider can provide alone (which is kind-of a farce, as AWS spans dozens of datacenters across the world, many of which are in the US, though region-wide outages have happened before, albeit rarely.

HashiCorp tools like Terraform and Packer make managing this sort of infrastructure much easier to do. However, both providers aren’t created equal, and the AWS support that exists is, at this time of writing, significantly more extensive. While this certainly doesn’t make using Azure with Terraform or Packer impossible, you might find yourself doing more homework than initially expected!

About Me


I’m a Technical Principal for Contino. We specialize in helping large and heavily-regulated enterprises make cloud adoption and DevOps culture a reality. I’m passionate about bringing DevOps to the enterprise. I’m also passionate about bikes, brews and travel!

Provisioning VMware Workstation Machines from Artifactory with Vagrant

I wrote a small
and helper library for provisioning VMware VMs from boxes hosted on Artifactory. I put this together with the intent of helping us easily provision our Rancher/Cattle/Docker-based platform wholesale on our machines to test changes before pushing them up.

Here it is:

Tests are to be added soon! I’m thinking Cucumber integration tests with unit tests on the helper methods and Vagrantfile correctness.

I also tried to emphasize small, isolated and easily readable methods with short call chains and zero side effects.

The pipeline would look roughly like this:

  • Clone repo containing our Terraform configurations, cookbooks and this Vagrantfile
  • Make changes
  • Do unit tests (syntax, linting, coverage, etc)
  • Integrate by spinning up a mock Rancher/Cattle/whatever environment with Vagrant
  • Run integration tests (do lb’s work, are services reachable, etc)
  • Vagrant destroy for teardown
  • Terraform apply to push changes to production

We haven’t gotten this far yet, but this Vagrantfile is a good starting point.

Getting Into DevOps.

I’ve observed a sharp uptick of developers and systems administrators interested in “getting into DevOps” within the last year or so. This pattern makes sense, too: in an age where a single developer can spin up a globally-distributed infrastructure for an application with a few dollars and a few API calls, the gap between development and systems administration is closer than ever. While I’ve seen plenty of blog posts and articles about cool DevOps tools and thoughts to think about, I’ve seen fewer content on pointers and suggestions for people looking to get into this work.

My goal with this article is to, hopefully, draw what that path looks like. My thoughts are based upon several interviews, chats, late-night discussions on and random conversation, likely over beer and delicious food. I’m also interested in hearing feedback from those that have made the jump; if you have, please email me. I’d love to hear your thoughts and stories.

Olde World IT

Understading history is key to understanding the future, and DevOps is no exception. To understand the pervasiveness and popularity of the DevOps movement, it’s helpful to understand what IT was like in the late-90’s and most of the ’00s. This was my experience.

I started my career as a Windows systems administrator in a large multi-national financial services firm in late 2006. In those days, adding new compute involved calling Dell (or, in our case, CDW) and placing a multi-hundred-thousand dollar order of servers, networking equipment, cables, and software, all destined for your on- and off-site datacenters. While VMware was still convincing companies that using virtual machines was, indeed, a cost-effective way of hosting their “performance-sensitive” application, many companies, including mine, pledged allegiance to running applications on their physical hardware. Our Technology department had an entire group dedicated to Datacenter Engineering and Operations, and their job was to negotiate our leasing rates down to some slightly-less-absurd monthly rate, ensure that our systems were being cooled properly (an exponentially difficult problem if you have enough equipment) and, if you were lucky/wealthy enough, that your off-shored datacenter crew knew enough about all of your server models to not accidentally pull the wrong thing during after-hours trading.

Amazon Web Services and Rackspace were slowly beginning to pick up steam, but were far from critical mass.

In these days, we also had teams dedicated to ensuring that the operating systems and software running on top of that hardware worked when they were supposed to. The engineers were responsible for architecting reliable architectures for patching, monitoring, and alerting these systems as well as define what the “gold image” looked like. Most of this work was done with much manual experimentation, and the extent of most tests was writing a runbook describing what you did, and ensuring that what you did did what you expected it to do after following said runbook. This was important in a large organization like ours, since most of the level 1 and 2 support was offshore, and the extent of their training ended with those runbooks.

(This is the world that your author lived in for the first three years of his career. My dream back then was to be the one who made the gold standard!)

Software releases were another beast altogether. Admittedly, I didn’t gain a lot of experience working on this side of the fence. However, from stories that I’ve gathered (and recent experience), much fo the daily grind for software development during this time went something like this:

  • Developers wrote code as specified by the technical and functional requirements laid out by business analysts from meetings they weren’t invited to,

  • Optionally, developers wrote unit tests for their code to ensure that it didn’t do anything obviously crazy, like try to divide over zero without throwing an exception.

  • When done, developers would mark their code as “Ready for QA”. A QA would pick up the code and run it in their own environment, which might or might not be like production or, even, the environment the developer used to test their own code against.

  • Failures would get sent back to the developers within “a few days or weeks” depending on other business activities and where priorities fell.

While sysadmins and developers didn’t see eye to eye often, the one thing they shared a common hatred for was “change management.” This was a composition of highly-regulated, and in the case of my employer at the time, highly necessary, rules and procedures governing when and how technical changes happened in a company. Most companies followed the Information Technology Infrastructure Library, or ITIL, process, which, in a nutshell, asked a lot of questions around why, when, where and how things happened along with a process for establishing an audit trail of the decisions that lead up to those answers.

As you could probably gather from my short snippet of history above, many, many things within IT were done manually. This lead to a lot of mistakes. Lots of mistakes lead up to lots of lost revenues. Change management’s job was to minimize those lost revenues, and this usually came in the form of releases only being done every two weeks and changes to servers, regardless of their impact or size, being queued up to be done sometime between Friday, 4pm and Monday, 5:59am.

(Ironically, this batching of work lead to even more mistakes, usually more serious ones.)

DevOps Isn’t A Tiger Team

You might be thinking “What is Carlos going on about, and when is he going to talk about Ansible playbooks?” I love Ansible tons, but hang on; this is important.

Have you been assigned to a project and ever had to interact with the “DevOps” team? Or did you have to rely on a “Configuraiton Management” or “CI/CD” team to ensure that your pipeline was set up properly? Were you ever beholden to attending meetings about your release and what it pertains to weeks after the work was marked “Code Complete?”

If so, then you’re re-living history. All of that comes from all of the above.

Silos form out of an instinctual draw to working with people like ourselves.[1] Naturally, it’s no surprise that this human trait also manifests in the workplace. I even saw this play out at a 250-person startup I worked at prior to joining ThoughtWorks. When I started, developers all worked in common pods and worked heavily with each other. As the codebase grew in complexity, developers who worked on common features naturally aligned with each other to try and tackle the complexity within their own feature. Soon afterwards, feature teams were offically formed.

Sysadmins and developers at many of the companies I worked at not only formed natural silos like this, but also fiercely competed and listened over each other. Developers were mad at sysadmins when their environments were broken. Developers were mad at sysadmins when their environments were too locked down. Sysadmins were mad that developers were breaking their environments in arbitrary ways all of teh time. Sysadmins were mad at developers for asking for way more computing power than they needed.

Neither side understood each other, and worse yet, neither side wanted to.

Most developers were uninterested in the basics of operating systems, kernels or, in some cases, computer hardware. As well, most sysadmins, even Linux sysadmins, avoided learning how to code with a ten foot pole. They tried a bit of C in college, hated it and never wanted to touch an IDE again. Consequently, developers threw their environment problems over the wall to sysadmins, sysadmins prioritized it with the hundreds of other things that were through over the wall to them, and everyone busy-waited angrily while hating each other.

The purpose of DevOps was to put an end to this.

DevOps isn’t a team. CI/CD isn’t a group in JIRA. DevOps is a way of thinking. According to the momvement, in an ideal world, developers, sysadmins and business stakeholders would be working as one team, and while they might not know about everything about each other’s worlds, they not only all know enough to understand each other and their backlogs, but they can, for the most part, speak the same language.

This is the basis behind having all infrastructure and business logic be in code and subject to the same deployment pipelines as the software that sits on top of it. Everybody is winning because everyone understands each other. This is also the basis behind the rise of other tools like chatbots and easily-accessible monitoring and graphing.

Adam Jacobs said it best: “DevOps is the word we use to describe the operational side of the transition to enterprises being software-led.”


A common question I’ve gotten asked is “What do I need to know to get into DevOps?” The answer, like most open-ended questions like this, is “it depends.”

At the moment, the “DevOps engineer” varies from company to company. Smaller companies that have plenty of software developers but fewer folks that understand infrastructure will likely look for people with more experience administrating systems. Other, usually larger and/or older companies, that have a solid sysadmin organization will likely optimize for something closer to a Google SRE, i.e. “a software engineer to design an operations function.”[2] This isn’t written in stone, however, as, like any technology job, the decision largely depends on the hiring manager sponsoring it.

That said, we typically look for engineers who are interested in learning more about:

  • How to administrate and architect secure and scalable cloud platforms (usually on AWS, but Azure, Google Cloud Platform and PaaS providers like DigitalOcean and Heroku are popular too),

  • How to build and optimize deployment pipelines and deployment strategies on popular CI/CD tools like Jenkins, GoCD and cloud-based ones like Travis CI or CircleCI,

  • How to monitor, log and alert on changes in your system with timeseries-based tools like Kibana, Grafana or Splunk and Loggly or Logstash, and

  • How to maintain infrastructure as code with configuration management tools like Chef, Puppet or Ansible, as well as deploy said infrastructure with tools like Terraform or CloudFormation.

Containers are becoming increasingly popular as well. Despite the beef against the status quo surrounding Docker at scale [3], containers are quickly becoming a great way of achieving extremely high density of services and applications running on fewer systems while increasing their reliability. (Orchestration tools like Kubernetes or Mesos can spin up new containers in seconds if the host they’re being served by fails.) Given this, having knowledge with Docker or rkt (no, it is not short for “Rocket”) and an orchestration platform like those aforementioned will go a long way.

If you’re a systems administrator that’s looking to make this change, you will also need to know how to write code. Python and Ruby are popular languages used for this purpose, as they are portable (can be used on any operating system), fast and easily to read and learn. They also form the underpinnings of the industry’s most popular configuration management tools (Python for Ansible, Ruby for Chef and Puppet) and cloud API clients (Python and Ruby are commonly used for AWS, Azure and GCP clients).

If you’re a developer that’s looking to make this change, I highly recommend learning more about UNIX, Windows and networking fundamentals. Even though the cloud abstracts away many of the complications of administrating a system, debugging slow application performance is aided greatly by knowing how these things work. I’ve included a few books on this topic in the next section.

If this sounds overwhelming, you aren’t alone. Fortunately, there are plenty of small projects to dip your feet into. One such toy project is Gary Stafford’s Voter Service, a simple Java-based voting platform. [4] We ask our candidates to take the service from Github to production infrastructure through a pipeline. One can combine that with Rob Mile’s awesome DevOps Tutorial repository [5] to learn about ways of doing this.

Another great way of becoming familiar with these tools is taking popular services and setting up an infrastructure for them using nothing but AWS and configuration management. Set it up manually first to get a good idea of what to do, and then replicate what you just did using nothing but CloudFormation (or Terraform) and Ansible. Surprisingly, this is a large part of the work that we Infrastructure Devs do for our clients on a daily basis. Our clients find this work to be highly valuable!

Theory Books

So you’ve probably read The Phoenix Project by Gene Kim. It covers much of the history I explained earlier (with much more color) and describes their journey to a lean company running on Agile and DevOps. It’s a great book.

Here are some others that are worth a read:

  • Driving Technical Change by Terrance Ryan. Awesome little book on common personalities within most technology organizations and how to deal with them. This helped me out more than I expected.

  • PeopleWare by Tom DeMarco and Tim Lister. A classic on managing engineering organizations. A bit dated, but still relevant.

  • Time Management for System Administrators by Tom Limoncelli from Stack Overflow. While this is heavily geared towards sysadmins, it provides great insight into the life of a systems administrator at most large organizations. If you want to learn more about the war between sysadmins and developers, this book might explain more.

  • The Lean Startup by Eric Ries. Describes how Eric’s 3D avatar company, IMVU, discovered how to work lean, fail fast and find profit faster. Lean Enterprise by Jez Humble and friends is an adaption of this book for the enterprise. Both are great reads and do a good job of explaining the business motivation behind DevOps.

  • Infrastructure As Code by our very own Kief Morris. Awesome primer on, well, infrastructure as code! Does a great job of describing why it’s essential for any business to adopt this for their infrastructure.

  • Site Reliability Engineering by Betsy Beyer, Chris Jones and others. A book explaining how Google does SRE, or also known as “DevOps before DevOps was a thing.” Provides interesting opinions on how to handle uptime, latency and keeping engineers happy.

Technical Books

If you’re looking for books that’ll take you straight to code, you’ve come to the right section.

  • TCP/IP Illustrated by the late W. Richard Stevens. This is the classic (and, arguably, complete) tome on the fundamental networking protocols, with special emphasis on TCP/IP. If you’ve heard of Layers 1, 2, 3 and 4 and are interested in learning more, you’ll need this book.

  • UNIX and Linux System Administration Handbook by Ben Whaley and Evi Nemeth. A great primer into how Linux and UNIX works and how to navigate around them.

  • Learn Windows Powershell In A Month of Lunches by Don Jones and Jeffrey Hicks. If you’re doing anything automated with Windows, you will need to learn how to use Powershell. This is the book that will help you do that. Don Jones is a well-known MVP in this space.

  • Practically anything by James Turnbull. He puts out great technical primers on popular DevOps-related tools.

In Closing

From companies deploying everything to bare metal (there are plenty that still do, for good reasons) to trail blazers doing everything serverless, DevOps is likely here to stay for a while. The work is interesting, the results are impactful, and, most important, it helps bridge the gap between technology and business.

It’s a wonderful thing to see.

Good luck!





Some Terraform gotchas.

So you’ve got a bacon delivery service repository with Terraform configuration files at the ready, and it looks something like this:

$> tree

0 directories, 3 files

terraform is applying your configurations and saving them in tfstate like you’d expect. Awesome.

Eventually, your infrastructure scales just large enough to necessitate a directory structure. You want to express your Terraform configurations in a way that (a) makes it easy to see what’s in which environment, (b) makes it easy to modify those environments without affecting other environments and (c) prevents your HCL from becoming a total mess not much unlike if you were to do it with Puppet or Chef.

Fortunately, Terraform makes this pretty easy to do…but not without some gotchas.


h2>One suggestion: Use modules!

Modules give you the ability to reuse Terraform resources throughout your codebase. This way, instead of having a bunch of aws_instances lying around in your main, you can neatly express them in ways that make more sense:

module "sandbox-web-servers" {
  source = "../modules/aws/sandbox"
  provider = ""
  environment = "sandbox"
  tier = "web"
  count = 10

When you do this, you need to populate Terraform’s module cache by using terraform get /path/to/module.


h2>Gotcha #1: Self variable interpolation isn’t a thing yet.

If you noticed, the example above references “sandbox” quite a lot. This is because, unfortunately, Terraform modules (and resources, I believe) do not yet support self-referencing variables. What I mean is this:

module "sandbox-web-server" {
  environment = "sandbox"
  source = "../modules/${var.self.environment}"

Given that everything in Terraform is a directed graph, the complexity in doing this makes sense. How do you resolve a reference to a variable that hasn’t been defined yet?

This was tracked here, but it looks like a blue-sky feature right now.

Gotcha #2: Module source paths are relative to the module.

Let’s say you had a module definition that looked like this:

module "sandbox-web-servers" {
  source = "modules/aws/sandbox"

and a directory structure that looked like this:

$> tree
├── infrastructure
│   └── sandbox
│       └──
└── modules
    └── aws
        └── sandbox

5 directories, 2 files

Upon running terraform apply, you’d get an awesome error saying that modules/aws/sandbox couldn’t be located, even if you ran it at the root. You’d wonder why this is given that Terraform is supposed to reference everything from the location from which the application was executed.

It turns out that modules don’t work that way. When modules are loaded with terraform get, their dependencies are sourced from the location of the module. I haven’t looked too deeply into this, but this is likely due to the way in which Terraform populates its graphs.

To fix this, you’ll need to either (a) create symlinks in all of your modules pointing to your module source, or (b) fix your sources to use relative paths relative to the location of the module, like this:

module "sandbox-web-servers" {
  source "../../modules/aws/sandbox"

Gotcha #3: Providers must co-exist with your infrastructure!

This one took me a few hours to reason about. Let’s go back to the directory structure referenced above (which I’ve included again below for your convenience):

$> tree
├── infrastructure
│   └── sandbox
│       └──
└── modules
    └── aws
        └── sandbox

5 directories, 2 files

Since you deploy to multiple different sources (nit pick: Nearly every example I’ve seen on Terraform assumes you’re using AWS!), you want to create a providers folder to express this. Additionally, since your infrastructure might be defined differently by environment and you want the thing that’s actually calling terraform to assume as little about your infrastructure as possible, you want to break it down by environment. When I tried this, it looked like this:

├── infrastructure
│   └── sandbox
│       └──
├── modules
│   └── aws
│       └── sandbox
│           └──
└── providers
    ├── openstack
    ├── colos
    ├── gce
    └── aws
        ├── dev
        │   ├──
        │   └──
        ├── pre-prod
        │   ├──
        │   └──
        ├── prod
        │   ├──
        │   └──
        └── sandbox

14 directories, 10 files

You now want to reference this in your modules:

# infrastructure/sandbox/
module "sandbox-web-servers" {
  source = "../../modules/aws/sandbox"
  provider = "" # using a provider alias

and are in for a pleasant surprise when you discover that Terraform fails because it can’t locate the “” provider.

I initially assumed that when Terraform looked for the nearest provider, it would search the entire directory for a suitable one, in other words, it would follow a search path like this:

- ./infrastructure/sandbox
- ./infrastructure
- .
- ./modules
- ./modules/aws
- ./modules/aws/sandbox
- .
- ./providers
- ./providers/aws
- ./providers/aws/sandbox <-- here

But that’s not what happens. Instead, it looks for its providers in the same location as the module being referenced. This meant that I had to put in the same place as

I couldn’t even get away with putting it in the directory for its requisite environment above it (i.e. ./infrastructure/aws/sandbox) because Terraform doesn’t currently support object inheritance.

Instead of re-defining my providers in every directory, I created my in every infrastructure environment folder I had (which is just sandbox at the moment) and symlinked it in every folder underneath it. In other words:

carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ ln -s ../ infrastructure/sandbox/aws/^C
carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ ls -lart infrastructure/sandbox/aws/
total 0
-rw-rw-rw- 1 carlosonunez carlosonunez  0 Dec  6 23:52
drwxrwxrwx 2 carlosonunez carlosonunez  0 Dec  7 00:14 ..
drwxrwxrwx 2 carlosonunez carlosonunez  0 Dec  7 00:14 .
lrwxrwxrwx 1 carlosonunez carlosonunez 15 Dec  7 00:14 -> ../
carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ tree
├── infrastructure
│   └── sandbox
│       ├── aws
│       │   ├── -> ../
│       │   └──
│       └──
├── modules
│   └── aws
│       └── sandbox
│           └──
└── providers
    ├── aws
    ├── colos
    ├── gce
    └── openstack
        ├── dev
        │   ├──
        │   └──
        ├── pre-prod
        │   ├──
        │   └──
        ├── prod
        │   ├──
        │   └──
        └── sandbox

15 directories, 12 files

It’s not great, but it’s a lot better than re-defining my providers everywhere.

Gotcha #4: Unset your provider env vars!

So the thing in Gotcha #3 never happened to you. It seemed to deploy just fine. That is until you realized you were deploying to the production account instead of the dev, which you were abruptly informed of by Finance when they were wondering why you spun up $15,000 worth of compute. Oops.

This is because of a thoughtful-yet-conveniently-unfortunate side effect of providers whereby (a) most of them support using environment variables to define their behavior, and (b) Terraform has no way of turning this off (an issue I recently raised).

For now, unset boto, openstack, gcloud or whatever provider CLI tool you might be using before running terraform commands. That, or run them in a clean shell using /bin/sh

That’s it!

I’m really enjoying Terraform. I hope you are too! Do you have any other gotchas? Want to leave some feedback? Throw in a comment below!

About Me


I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Enable Linux on Windows the fast way.

Do you have a Windows machine running Windows 10 Anniversary Edition? Do you want to install Ubuntu on that machine so you can have a real Terminal and do real Linux things (Something something DOCKER DOCKER DOCKER something something)? Do you want to do this all through Powershell?

Say no more. I got you.

Start an elevated Powershell session. (Click on the Start button. Type “powershell” into the Search bar. Hit Shift then Enter. Click “Ok.”) Copy and paste this into it. Restart your machine. Enjoy Linux on Windows. What a time to be alive.

# Create AppModelUnlock if it doesn't exist, required for enabling Developer Mode
 $RegistryKeyPath = "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\AppModelUnlock"
 if (-not(Test-Path -Path $RegistryKeyPath)) {
 New-Item -Path $RegistryKeyPath -ItemType Directory -Force

# Add registry value to enable Developer Mode
 New-ItemProperty -Path $RegistryKeyPath -Name AllowDevelopmentWithoutDevLicense -PropertyType DWORD -Value 1

# Enable the Linux subsystem
 Get-WindowsOptionalFeature -Online | ?{$_.FeatureName -match "Linux"} | %{ Enable-WindowsOptionalFeature -Online -FeatureName $_.FeatureName}
 Restart-Computer -Force

# Install Ubuntu
 # Start an elevated Powershell session first
 lxrun /install /y
 lxrun /setdefaultuser <username that you want>

# Start it!


  • Install Chocolatey. It’s a package manager for Windows. It’s damn good. You can write your own packages too.
  • Install ConsoleZ: choco install consolez. It’s the best.
  • Install gvim: choco install gvim.
  • Install vcxsrv (the new xming, now with an even more abstract name!): choco install vcxsrv
  • Put Set-PSReadLineOption -EditMode Emacs into your profile: vim $PROFILE. Enjoy emacs keybindings for your Powershell session.
  • You can forward X11 applications to Windows! Prefix your application with DISPLAY:=0 after installing and starting vcxsrv. Speed is fine; it’s a lot faster than doing it over SSH (as expected since Ubuntu is running under a Windows subsystem and these syscalls are abstracted by Window syscalls).

About Me

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Configuration management and provisioning are different.

Configuration management tools are used to repeatably and consistently system and application uniformity across clusters of systems at scale. Many of these tools achieve this in three ways: an intuitive command line interface, a lightweight and easily-readable domain-specific language and a comprehensive REST-based API to lower the barrier-to-entry for integrations with other tools. While open-source configuration management tools such as Chef, Ansible, Puppet and Salt have been increasing in popularity over the years, there are also enterprise-grade and regulator-friendly offerings available from vendors such as Dell, Microsoft, HP and BMC.

Configuration management tools are great at keeping a running inventory of existing systems and applications up-to-date. These tools are so good at this, in fact, that many systems administrators and engineers grow tempted into using them to deploy swaths of new systems and configure them shortly thereafter.

I’ve seen this play out at many companies that I’ve worked at. This would usually manifest into an Ansible deployment playbook or a Chef cookbook that eventually became “the” cookbook. The result has always been the same, and if I had to sum this pattern up into a picture, it would look something like this:


Let me explain.

Complexity in simplicity.

One of the darling features of modern configuration management tools is its ability to express complex configuration states in an easily-readable way. This works well for creating an Ansible playbook to configure, say, an nginx instance, but begins to fall apart when trying to, say, provision the instances on which those nginx instances will be hosted and gets really ugly when you attempt to create relationships to deploy application servers with those web servers in trying to stage an environment.

Creating common templates for security groups or firewall templates, instances, storage and the like with configuration management tools outside of what they provide out of the box usually involves writing a lot of boilerplate beforehand. (For example, Ansible has a plugin for staging ebs volumes, but what if you want an EBS resource with specific defaults for certain web or application servers in certain regions? Prepare for a lot of if blocks.) Problems usually also crop up in passing metadata like instance IDs between resources. Because most of the actions done by configuration management tools are idempotent executions, the simple languages they use to to describe configurations don’t natively support variables. Storing metadata and using loops are usually done by breaking out of the DSL and into its underlying language. This breaks readability and makes troubleshooting more complicated.

Provisioning is hard.

Provisioning infrastructure is usually more complicated than configuring the software underlying that infrastructure in two ways.

The first complication arises from the complicated relationships between pieces of infrastructure. Expressing specific environmental nuances of a postgres installation is usually done by way of Chef cookbook attributes and flags. An example of this can be found here. Expressing the three different regions that the databases backing your web app need to be deployed to in a particular environment and the operating system images that those instances need to have will likely require separate layers of attributes: one for image mappings, another for instance sizing and yet another for region mapping. Expressing this in cookbooks gets challenging.

The second complication comes from gathering state. Unless your infrastructure is completely immutable (which is ideal), some state of your infrastructure in the status quo is required before deploying anything. Otherwise, you’ll be in for a surprise or two after you deploy the servers for that environment you thought didn’t exist. Tools like Terraform and AWS CloudFormation keep track of this state to prevent these situations from happening. Chef or Puppet, for example, do not. You can use built-in resources to capture this data and make decisions based on those results, but that puts you back into manipulating their DSLs to do things they weren’t intended to do.

Rollbacks are harder.

chef-provisioning and Ansible provisioning plugins do not support rolling back changes if something fails. This is problematic for three reasons:

  1. Inconsistent environments lead to increased overhead and (usually manual) sysadmin toil. Toil leads to technical debt, and debt leads to slower releases and grumpy teams. Nobody wants to have a grumpy team.
  2. In the cloud, nearly everything costs money. Resources that you deployed during tests that weren’t destroyed afterwards can add up to hefty surprises at the end of your billing cycle.
  3. Cookbook recipes and playbooks will need to account for these stale resources when executing their actions. This can lead to a more complicated codebase and a debt to pay back later on.

Provisioning tools such as Terraform, Cloudformation and Vagrant support rollback out of the box.

Use the right tools.

If you’re staring at a behemoth of a playbook for provisioning your stack or looking to make the move away from chef-provisioning, take a look at XebiaLabs awesome list of tools that make provisioning less complicated. CloudFormation is awesome at provisioning AWS infrastructure (unless you dislike JSON, in which case it is far from it), Vagrant is great at doing the same for physical infrastructure and Packer does a great job of building images as code.

Good luck!

About Me


I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Making sense of this ChatOps thing

So I’m still not entirely sold on the urgency or importance of “chatops.”

I’m a huge fan of Google Assistant neé Now. I wish that I could replace Siri with it daily. It can answer nearly any question you throw at it, and it is smart enough to do contextual things that resemble conversations. For fun, I just asked Siri to navigate me to my favorite winery from Lewisville, TX to Grapevine, TX, Messina Hof while away. Here’s what it came back with:


Not very useful. What’s a Messina?

Google Assistant, on the other hand, knows what’s up…kind of:


It didn’t get me to the Grapevine location my fiancée and I always go to, but it (a) knew I was talking about Messina Hof, and (b) navigated me to their biggest vineyard in Bryan, TX (a.k.a Aggieland, opinions notwithstanding).

Here’s the thing, though: in almost every case, I will probably open Google Maps and search for the location there. I’m sure that, in the near future, Assistant will be knowledgable enough to know the exact location I want and whether I should stop for gas and a coffee on the way there (Google’s awesome new phone will probably help accelerate that). In the present, however, it’s a lot faster to do all of that from the app.

Which kind of explains my issue with chatops.

What’s ChatOps?

PagerDuty (awesome on-call management app, highly recommend) explains that, holistically, chatops:

…is all about conversation-driven development. By bringing your tools into your conversations and using a chat bot modified to work with key plugins and scripts, teams can automate tasks and collaborate, working better, cheaper and faster.

Since this is DevOps and that definition wouldn’t be complete without referring to tooling of some sort, remember this?


Think that, but with your infrastructure, more Slack, more modern Web and fewer early 2000s nostalgia:


The overall goal of chatops is to use communication mediums that we take advantage of on a daily basis to manage workflows and infrastructure more seamlessly. (To me, email automation would not only squarely fit in with this design pedagogy, but, as discussed later, would also probably be the most compatible and far-reaching solution for people.)

I’m not saying ChatOps isn’t awesome.

There are several frameworks out there that enable companies and teams to start playing around. Hubot, by Github, is the most well-known one. It works with just about every messaging platform out there, including Lync if you have a XMPP gateway set up. Slack integrations and webhooks are also very popular for companies using that product. When implemented correctly, chatops can be quite powerful.

Being able to say phrases like /deploybot deploy master of <project> to preprod or /beachbot create a sandbox environment for myawesometool from carlosnunez’s fork on Slack or Jabber and action on them would be incredibly neat, not to mention incredibly fast. This can be immensely valuable in several high-touch situations such as troubleshooting unexpected issues with infrastructure or automating product releases from a common tool.

More mature implementations can go much, much deeper than that.


I listened to an extremely interesting episode of Planet Money recently that explained an interesting period of growth for Subaru in the late 1990s to early 2000s. Subaru was struggling to compete with booming Japanese automakers at the time. They were producing cheaper cars faster and were successful in aggressively targetting the mid-market that Subaru classically did well in. Growth eventually went negative, and morales plummeted with it.

In the late 1990s, they made a discovery while trying to find a modicum of success with what they currently had. They discovered that out of their entire lineup of products, only one was selling consistently: the Impreza. They sought to find out why.

What they found was surprising. They saw that this car, and only this car, had a strong positive correlation with female buyers, specifically females that lived together. So they, with the help of Mulryan/Nash, their ad agency, tried something rash: they aimed to exclusively target homosexual couples in almost all of their ad campaigns.

Their sales soared. In fact, they were the only auto manufacturer to generate revenue during the 2008 Global Financial Crisis.

(Check out the full story here if you’re interested in learning more!)

Wouldn’t it have been awesome if they had bots that scoured sales demographics data from their network of dealerships and turn the identified trends covered within into emails or chats that marketing or sales managers can parse and make these same decisions on? How much faster do you think they would have been able to identify this and action on it? How many other trends could they have uncovered and made potential sales on?

That’s what I think when I hear about ChatOps. But let’s get back to reality.

I’m saying that it’s just not that crucial.

There are a lot of things that have to be done “right” before chatops can work. Monitoring and alerting have to be on point, especially for implementing things like automated alert or alarm bots. Creating new development environments have to be automated or at least have a consistent process from which automation can occur. Configuration management has to exist and has to be consistent for deployment bots to work. The list goes on.

Here in lies the rub: for engineers, accomplishing these things from a command-line tool is just as simple, and developers and engineers tend to spend just as much time with their tools as their IM client. Furthermore, implementing new systems introduces complexity, so introducing chatops to an organization when their tooling needs improvement will usually lead to my Messina-that-isn’t-Messina Hof situation from before where the quality of both toolsets ultimately suffers. So if the goal of implementing chatops is to make engineering’s life easier (or to make it easier for non-technical people to gain more understandable views into their tech), there might be easier and more important wins to be had first.

It’s not the end-all-be-all…yet.

Financial companies, tech-friendly law firms and news organizations use chatops to help model the state of markets, find trends in big law to identify new opportunities and uncover breaking news to broadcast around the world. The intrinsic value of ChatOps is definitely apparent.

That said, the foundation of the house comes first. Infrastructure, process and culture have to be solid and at least somewhat automated before chatops can make sense.

About Me


I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Driving technical change isn’t always technical

Paperful office

Locked rooms full of potential secrets was nothing new for a multinational enterprise that a colleague of mine consulted for a few years ago. A new employee stumbling upon one of these rooms, however, was.

What that employee found in his accidental discovery was a bit unusual: a room full of boxes, all of which were full of neatly-filed printouts of what seemed like meeting minutes. Curious about his new find, he asked his coworkers if they knew anyting about this room.

None did.

It took him weeks to find the one person that had a clue about this mysterious room. According to her, one team was asked to summarize their updates every week, and every week, someone printed them out, shipped it to the papers-to-the-metaphoric-ceiling room and categorized it.

Seems strange? This fresh employee thought so. He sought to find out why.

After a few weeks of semi-serious digging, he excavated the history behind this process. Many, many years ago (I’m talking about bring-your-family-into-security-at-the-airport days), an executive was on his way to a far-away meeting and remembered along the way that he forgot to bring a summary of updates for an important team that was to come up in discussion. Panicked, he asked his executive assistant to print it out and bring it to him post haste. She did.

To prevent this from happening again, she printed and filed this update out every week in the room that eventually became the paper jungle gym. She trained her replacement to do this, her replacement trained her replacement; I think you see where this is headed. The convenience eventually became a “rule,” and because we tend to be conformant in social situations, this rule was never contested.

None of those printed updates in that room were ever used.

This has nothing to do with DevOps.

Keep reading.

I’m not sure of what became of that rule (and neither does my colleague). There is one thing I’m sure of, though: tens of thousands of long-lived companies of all sizes have processes like these. Perhaps your company’s deployments to production depend on an approval from some business unit that’s no longer involved with the frontend. Perhaps your company requires a thorough and tedious approval process for new software regardless of its triviality or use. Perhaps your team’s laptops and workstations are locked down as much as a business analyst who only uses their computers for Excel, Word and PowerPoint. (It’s incredible what they can do. Excel itself is a damn operating system; it even includes its own memory manager.)

Some of the simplest technology changes you can make to help your company go faster to market don’t involve technology at all. If you notice a rule or process that doesn’t make sense, it might be worth your while to do your own digging and question it. More people might agree with you than you think.

About Me

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.