Some Terraform gotchas.

So you’ve got a bacon delivery service repository with Terraform configuration files at the ready, and it looks something like this:


$> tree
.
├── main.tf
├── providers.tf
└── variables.tf

0 directories, 3 files

terraform is applying your configurations and saving them in tfstate like you’d expect. Awesome.

Eventually, your infrastructure scales just large enough to necessitate a directory structure. You want to express your Terraform configurations in a way that (a) makes it easy to see what’s in which environment, (b) makes it easy to modify those environments without affecting other environments and (c) prevents your HCL from becoming a total mess not much unlike if you were to do it with Puppet or Chef.

Fortunately, Terraform makes this pretty easy to do…but not without some gotchas.

<

h2>One suggestion: Use modules!

Modules give you the ability to reuse Terraform resources throughout your codebase. This way, instead of having a bunch of aws_instances lying around in your main, you can neatly express them in ways that make more sense:


module "sandbox-web-servers" {
  source = "../modules/aws/sandbox"
  provider = "aws.us-west-1"
  environment = "sandbox"
  tier = "web"
  count = 10
}

When you do this, you need to populate Terraform’s module cache by using terraform get /path/to/module.

<

h2>Gotcha #1: Self variable interpolation isn’t a thing yet.

If you noticed, the example above references “sandbox” quite a lot. This is because, unfortunately, Terraform modules (and resources, I believe) do not yet support self-referencing variables. What I mean is this:


module "sandbox-web-server" {
  environment = "sandbox"
  source = "../modules/${var.self.environment}"
  ...
}

Given that everything in Terraform is a directed graph, the complexity in doing this makes sense. How do you resolve a reference to a variable that hasn’t been defined yet?

This was tracked here, but it looks like a blue-sky feature right now.

Gotcha #2: Module source paths are relative to the module.

Let’s say you had a module definition that looked like this:


module "sandbox-web-servers" {
  source = "modules/aws/sandbox"
}

and a directory structure that looked like this:


$> tree
.
├── infrastructure
│   └── sandbox
│       └── web_servers.tf
└── modules
    └── aws
        └── sandbox
            └── main.tf

5 directories, 2 files

Upon running terraform apply, you’d get an awesome error saying that modules/aws/sandbox couldn’t be located, even if you ran it at the root. You’d wonder why this is given that Terraform is supposed to reference everything from the location from which the application was executed.

It turns out that modules don’t work that way. When modules are loaded with terraform get, their dependencies are sourced from the location of the module. I haven’t looked too deeply into this, but this is likely due to the way in which Terraform populates its graphs.

To fix this, you’ll need to either (a) create symlinks in all of your modules pointing to your module source, or (b) fix your sources to use relative paths relative to the location of the module, like this:


module "sandbox-web-servers" {
  source "../../modules/aws/sandbox"
  ...
}

Gotcha #3: Providers must co-exist with your infrastructure!

This one took me a few hours to reason about. Let’s go back to the directory structure referenced above (which I’ve included again below for your convenience):


$> tree
.
├── infrastructure
│   └── sandbox
│       └── web_servers.tf
└── modules
    └── aws
        └── sandbox
            └── main.tf

5 directories, 2 files

Since you deploy to multiple different sources (nit pick: Nearly every example I’ve seen on Terraform assumes you’re using AWS!), you want to create a providers folder to express this. Additionally, since your infrastructure might be defined differently by environment and you want the thing that’s actually calling terraform to assume as little about your infrastructure as possible, you want to break it down by environment. When I tried this, it looked like this:


.
├── infrastructure
│   └── sandbox
│       └── web_servers.tf
├── modules
│   └── aws
│       └── sandbox
│           └── main.tf
└── providers
    ├── openstack
    ├── colos
    ├── gce
    └── aws
        ├── dev
        │   ├── main.tf
        │   └── variables.tf
        ├── pre-prod
        │   ├── main.tf
        │   └── variables.tf
        ├── prod
        │   ├── main.tf
        │   └── variables.tf
        └── sandbox
            ├── main.tf
            └── variables.tf

14 directories, 10 files

You now want to reference this in your modules:


# infrastructure/sandbox/aws_web_servers.tf
module "sandbox-web-servers" {
  source = "../../modules/aws/sandbox"
  provider = "aws.sandbox.us-west-1" # using a provider alias
  ...
}

and are in for a pleasant surprise when you discover that Terraform fails because it can’t locate the “aws.sandbox.us-west-1” provider.

I initially assumed that when Terraform looked for the nearest provider, it would search the entire directory for a suitable one, in other words, it would follow a search path like this:


- ./infrastructure/sandbox
- ./infrastructure
- .
- ./modules
- ./modules/aws
- ./modules/aws/sandbox
- .
- ./providers
- ./providers/aws
- ./providers/aws/sandbox <-- here

But that’s not what happens. Instead, it looks for its providers in the same location as the module being referenced. This meant that I had to put providers.tf in the same place as aws_web_servers.tf.

I couldn’t even get away with putting it in the directory for its requisite environment above it (i.e. ./infrastructure/aws/sandbox) because Terraform doesn’t currently support object inheritance.

Instead of re-defining my providers in every directory, I created my providers.tf in every infrastructure environment folder I had (which is just sandbox at the moment) and symlinked it in every folder underneath it. In other words:


carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ ln -s ../providers.tf infrastructure/sandbox/aws/providers.tf^C
carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ ls -lart infrastructure/sandbox/aws/
total 0
-rw-rw-rw- 1 carlosonunez carlosonunez  0 Dec  6 23:52 web_servers.tf
drwxrwxrwx 2 carlosonunez carlosonunez  0 Dec  7 00:14 ..
drwxrwxrwx 2 carlosonunez carlosonunez  0 Dec  7 00:14 .
lrwxrwxrwx 1 carlosonunez carlosonunez 15 Dec  7 00:14 providers.tf -> ../providers.tf
carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ tree
.
├── infrastructure
│   └── sandbox
│       ├── aws
│       │   ├── providers.tf -> ../providers.tf
│       │   └── web_servers.tf
│       └── providers.tf
├── modules
│   └── aws
│       └── sandbox
│           └── main.tf
└── providers
    ├── aws
    ├── colos
    ├── gce
    └── openstack
        ├── dev
        │   ├── main.tf
        │   └── variables.tf
        ├── pre-prod
        │   ├── main.tf
        │   └── variables.tf
        ├── prod
        │   ├── main.tf
        │   └── variables.tf
        └── sandbox
            ├── main.tf
            └── variables.tf

15 directories, 12 files

It’s not great, but it’s a lot better than re-defining my providers everywhere.

Gotcha #4: Unset your provider env vars!

So the thing in Gotcha #3 never happened to you. It seemed to deploy just fine. That is until you realized you were deploying to the production account instead of the dev, which you were abruptly informed of by Finance when they were wondering why you spun up $15,000 worth of compute. Oops.

This is because of a thoughtful-yet-conveniently-unfortunate side effect of providers whereby (a) most of them support using environment variables to define their behavior, and (b) Terraform has no way of turning this off (an issue I recently raised).

For now, unset boto, openstack, gcloud or whatever provider CLI tool you might be using before running terraform commands. That, or run them in a clean shell using /bin/sh

That’s it!

I’m really enjoying Terraform. I hope you are too! Do you have any other gotchas? Want to leave some feedback? Throw in a comment below!

About Me

20160408

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Advertisements

Configuration management and provisioning are different.

Configuration management tools are used to repeatably and consistently system and application uniformity across clusters of systems at scale. Many of these tools achieve this in three ways: an intuitive command line interface, a lightweight and easily-readable domain-specific language and a comprehensive REST-based API to lower the barrier-to-entry for integrations with other tools. While open-source configuration management tools such as Chef, Ansible, Puppet and Salt have been increasing in popularity over the years, there are also enterprise-grade and regulator-friendly offerings available from vendors such as Dell, Microsoft, HP and BMC.

Configuration management tools are great at keeping a running inventory of existing systems and applications up-to-date. These tools are so good at this, in fact, that many systems administrators and engineers grow tempted into using them to deploy swaths of new systems and configure them shortly thereafter.

I’ve seen this play out at many companies that I’ve worked at. This would usually manifest into an Ansible deployment playbook or a Chef cookbook that eventually became “the” cookbook. The result has always been the same, and if I had to sum this pattern up into a picture, it would look something like this:

i-swear-this-worked-on-my-machine

Let me explain.

Complexity in simplicity.

One of the darling features of modern configuration management tools is its ability to express complex configuration states in an easily-readable way. This works well for creating an Ansible playbook to configure, say, an nginx instance, but begins to fall apart when trying to, say, provision the instances on which those nginx instances will be hosted and gets really ugly when you attempt to create relationships to deploy application servers with those web servers in trying to stage an environment.

Creating common templates for security groups or firewall templates, instances, storage and the like with configuration management tools outside of what they provide out of the box usually involves writing a lot of boilerplate beforehand. (For example, Ansible has a plugin for staging ebs volumes, but what if you want an EBS resource with specific defaults for certain web or application servers in certain regions? Prepare for a lot of if blocks.) Problems usually also crop up in passing metadata like instance IDs between resources. Because most of the actions done by configuration management tools are idempotent executions, the simple languages they use to to describe configurations don’t natively support variables. Storing metadata and using loops are usually done by breaking out of the DSL and into its underlying language. This breaks readability and makes troubleshooting more complicated.

Provisioning is hard.

Provisioning infrastructure is usually more complicated than configuring the software underlying that infrastructure in two ways.

The first complication arises from the complicated relationships between pieces of infrastructure. Expressing specific environmental nuances of a postgres installation is usually done by way of Chef cookbook attributes and flags. An example of this can be found here. Expressing the three different regions that the databases backing your web app need to be deployed to in a particular environment and the operating system images that those instances need to have will likely require separate layers of attributes: one for image mappings, another for instance sizing and yet another for region mapping. Expressing this in cookbooks gets challenging.

The second complication comes from gathering state. Unless your infrastructure is completely immutable (which is ideal), some state of your infrastructure in the status quo is required before deploying anything. Otherwise, you’ll be in for a surprise or two after you deploy the servers for that environment you thought didn’t exist. Tools like Terraform and AWS CloudFormation keep track of this state to prevent these situations from happening. Chef or Puppet, for example, do not. You can use built-in resources to capture this data and make decisions based on those results, but that puts you back into manipulating their DSLs to do things they weren’t intended to do.

Rollbacks are harder.

chef-provisioning and Ansible provisioning plugins do not support rolling back changes if something fails. This is problematic for three reasons:

  1. Inconsistent environments lead to increased overhead and (usually manual) sysadmin toil. Toil leads to technical debt, and debt leads to slower releases and grumpy teams. Nobody wants to have a grumpy team.
  2. In the cloud, nearly everything costs money. Resources that you deployed during tests that weren’t destroyed afterwards can add up to hefty surprises at the end of your billing cycle.
  3. Cookbook recipes and playbooks will need to account for these stale resources when executing their actions. This can lead to a more complicated codebase and a debt to pay back later on.

Provisioning tools such as Terraform, Cloudformation and Vagrant support rollback out of the box.

Use the right tools.

If you’re staring at a behemoth of a playbook for provisioning your stack or looking to make the move away from chef-provisioning, take a look at XebiaLabs awesome list of tools that make provisioning less complicated. CloudFormation is awesome at provisioning AWS infrastructure (unless you dislike JSON, in which case it is far from it), Vagrant is great at doing the same for physical infrastructure and Packer does a great job of building images as code.

Good luck!

About Me

20160408

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Config management and cloud provisioning: There be dragons

So I’ve tried using configuration management to deploy infrastructure to two different clouds and learned this: whenever you think “it would be great if we could deploy to EC2 with Chef,” use CloudFormation or Terraform instead.

Why? Here are a few reasons that come to mind:

  • CloudFormation/Terraform is easier. Terraform YAML is nicer than CloudFormation JSON, but both are *way* easier than trying to shoehorn Jinja2 (Ansible) or chef-provisioning Ruby to do what you want. Like, hundreds of lines easier.

    I once tried to use Ansible to automate provisioning of Active Directory forests onto EC2. I had to create my own roles for handling AMI selection, security group CRUD operations, EBS provisioning, etc. The 2000+ lines of YAML I wrote to uphold all that bass ultimately became about 200 lines of ugly, yet functional, CloudFormation JSON.

    Yeah.
  • Built-in rollback is awesome. CloudFormation and Terraform both support some kind of rollback. Chef provisioning does as well with the :rollback action (I don’t think Ansible does; at least it didn’t when I used the EC2 plugin), but it’s not guaranteed.
  • I really liked the CloudFormation API. I haven’t tried Terraform’s CLI yet, but I would imagine that it’s just as awesome. aws cloudformation provides a lot of useful information that’s easy to action upon in a Chef recipe or Ansible play, especially given that both platforms have support for CloudFormation “built-in.” What’s better, the AWS SDKs have full support for CloudFormation as well, which means…
  • You’re not locked into anything. This was the biggest takeaway from my experiences using chef-provisioning or ansible-ec2. If you ever decide to move away from Chef or Ansible, you’ll need to port over your deployment code with it. Depending on the platform, this could take anywhere from hours to weeks.

    Not a problem with CloudFormation or Terraform. Perhaps you’ll need to change how your Chef shell resource behaves, but that’s a lot easier to deal with, in my opinion.

Using your config management solution to do it all is really attractive. It’s usually not a bad idea either. However, when it comes to cloud, tread carefully!

About Me

Carlos Nunez is a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. He loves everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel.

Follow him on Twitter! @easiestnameever.