A Few Gotchas About Going Multi-Cloud with AWS, Microsoft Azure and HashiCorp tools.

One of the more interesting types of work we do at Contino is help our clients make sense of the differences between AWS and Microsoft Azure. While the HashiCorp toolchain (Packer, Terraform, Vault, Vagrant, Consul and Nomad) have made provisioning infrastructure a breeze compared to writing hundreds of lines of Python, they almost make achieving a multi-cloud infrastructure deployment seem too easy.

This post will outline some of the differences I’ve observed with using these tools against both cloud platforms. As well, since I used the word “multi-cloud” in my first paragraph, I’ll briefly discuss some general talking points on “things to consider” before embarking on a multi-cloud journey at the end.

Azure and ARM Are Inseparable

One of the core features that make Terraform and Packer tick are providers and builders, respectively. These allow third-parties to write their own “glue” code that tells Terraform how to create VMs or Packer how to create machine images. This way, Terraform and Packer simply become “thin-clients” for your desired platform. HashiCorp’s recent move of moving provider code out of the Terraform binary in version 0.10 emphasizes this.

Alas, when you create VMs with Terraform or machine images with Packer, you’re really asking the AWS Golang SDK to do those things. This is mostly the case with Azure, with one big exception: the Azure Resource Manager, or ARM.

ARM is more-or-less like AWS CloudFormation. You create a JSON template of the resources that you’d like to deploy into a single resource group along with the relationships that should exist between those resources and submit that into ARM as a deployment. It’s pretty nifty stuff.

However, instead of Terraform or Packer using the Azure Go SDK directly to create these resources, they both rely on ARM through the Azure Go SDK to do that job for them. I’m guessing that HashiCorp chose to do it this way to avoid rework (i.e. “why create a resource object in our provider or builder when ARM already does most of that work?”) While this doesn’t have too many implications in how you actually use these tools against Azure, there are some notable differences in what happens at runtime.

Azure Deployments Are Slower

My experience has shown me that the Azure ARM Terraform provider and Packer builder takes slightly more time to “get going” than the AWS provider does, especially when using Standard_A class VMs. This can make testing code changes quite tedious.

Consider the template below. This uses a t2.micro instance to provision a Red Hat image with no customizations.

{
"description": "Basic RHEL image.",
"variables": {
"access_key": null,
"secret_key": null
},
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{ user `access_key` }}",
"secret_key": "{{ user `secret_key` }}",
"region": "us-east-1",
"instance_type": "t2.micro",
"source_ami": "ami-c998b6b2",
"ami_name": "test_ami",
"ssh_username": "ec2-user",
"vpc_id": "vpc-8a2dbbf2",
"subnet_id": "subnet-306b673c"
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"#This is required to allow us to use `sudo` from our Packer provisioner.",
"#This is enabled by default on all RHEL images for \"security.\"",
"sudo sed -i.bak -e '/Defaults.*requiretty/s/^/#/' /etc/sudoers"
]
},
{
"type": "shell",
"inline": ["echo Hey there"]
}
]
}

Assuming a fast internet connection (I did this test with a ~6 Mbit connection), it doesn’t take too much time for Packer to generate an AMI for us.

$ time packer build -var 'access_key=REDACTED' -var 'secret_key=REDACTED' aws.json
==> amazon-ebs: Creating temporary security group for this instance: packer_5a136414-1ba5-7c7d-890c-697a8563d4be
==> amazon-ebs: Authorizing access to port 22 from 0.0.0.0/0 in the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
...
amazon-ebs: Hey there
==> amazon-ebs: Stopping the source instance...
amazon-ebs: Stopping instance, attempt 1
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating the AMI: test_ami
amazon-ebs: AMI: ami-20ff765a
...
Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
us-east-1: ami-20ff765a

real 1m50.900s
user 0m0.020s
sys 0m0.008s

Let’s repeat this exercise with Azure. Here’s that template again, but Azure-ified:

{
"description": "Basic RHEL image.",
"variables": {
"client_id": null,
"client_secret": null,
"subscription_id": null,
"azure_location": null,
"azure_resource_group_name": null
},
"builders": [
{
"type": "azure-arm",
"communicator": "ssh",
"ssh_pty": true,
"managed_image_name": "rhel-{{ user `base_rhel_version` }}-rabbitmq-x86_64",
"managed_image_resource_group_name": "{{ user `azure_resource_group_name` }}",
"os_type": "Linux",
"vm_size": "Standard_B1",
"client_id": "{{ user `client_id` }}",
"client_secret": "{{ user `client_secret` }}",
"subscription_id": "{{ user `subscription_id` }}",
"location": "{{ user `azure_location` }}",
"image_publisher": "RedHat",
"image_offer": "RHEL",
"image_sku": "7.3",
"image_version": "latest"
}
],
"provisioners": [
{
"type": "shell",
"inline": [
"#This is required to allow us to use `sudo` from our Packer provisioner.",
"#This is enabled by default on all RHEL images for \"security.\"",
"sudo sed -i.bak -e '/Defaults.*requiretty/s/^/#/' /etc/sudoers"
]
},
{
"type": "shell",
"inline": ["echo Hey there"]
}
]
}

And here’s us running this Packer build. I decided to use a Basic_A0 instance size, as that is the closest thing that Azure has to a t2.micro instance that was available for my subscription. (The Standard_B series is what I originally intended to use, as, like the t2 line, those are burstable.)

Notice that it takes almost TEN times as long with the same Linux distribution and similar instance sizes!

$ packer build -var 'client_id=REDACTED' -var 'client_secret=REDACTED' -var 'subscription_id=REDACTED' -var 'tenant_id=REDACTED' -var 'resource_group=REDACTED' -var 'location=East US' azure.json
azure-arm output will be in this color.

==> azure-arm: Running builder ...
azure-arm: Creating Azure Resource Manager (ARM) client ...
==> azure-arm: Creating resource group ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> Location : 'East US'
...
azure-arm: Hey there
==> azure-arm: Querying the machine's properties ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> ComputeName : 'pkrvms6sj74tdvk'
==> azure-arm: -> Managed OS Disk : '/subscriptions/8bbbc92b-6d16-4eb2-8f95-7a0769748c8d/resourceGroups/packer-Resource-Group-s6sj74tdvk/providers/Microsoft.Compute/disks/osdisk'
==> azure-arm: Powering off machine ...
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> ComputeName : 'pkrvms6sj74tdvk'
==> azure-arm: Capturing image ...
==> azure-arm: -> Compute ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: -> Compute Name : 'pkrvms6sj74tdvk'
==> azure-arm: -> Compute Location : 'East US'
==> azure-arm: -> Image ResourceGroupName : 'REDACTED'
==> azure-arm: -> Image Name : 'IMAGE_NAME'
==> azure-arm: -> Image Location : 'eastus'
<strong>==> azure-arm: Deleting resource group ...</strong>
==> azure-arm: -> ResourceGroupName : 'packer-Resource-Group-s6sj74tdvk'
==> azure-arm: Deleting the temporary OS disk ...
==> azure-arm: -> OS Disk : skipping, managed disk was used...
Build 'azure-arm' finished.

==> Builds finished. The artifacts of successful builds are:
--> azure-arm: Azure.ResourceManagement.VMImage:

ManagedImageResourceGroupName: REDACTED
ManagedImageName: IMAGE_NAME
ManagedImageLocation: eastus

<strong>real 10m27.036s
user 0m0.056s
sys 0m0.020s</strong>

The worst part about this is that it takes this long even when it fails!

Notice the “Deleting resource group…” line I highlighted. You’ll likely spend a lot of time looking at that line. For some reason, cleanup after an ARM deployment can take a while. I’m guessing that this is due to three things:

  1. Azure creating intermediate resources, such as virtual networks (VNets), subnets and compute, all of which can take time,
  2. ARM waiting for downstream SDKs to finish deleting resources and/or any associated metadata, and
  3. Packer issuing asynchronous operations to the Azure ARM service, which requires polling the operationResult endpoint every so often to see how things played out.

Pro-Tip: Use the az Python CLI before running things!

As recovering from Packer failures can be quite time-consuming, you might want to consider leveraging the Azure command-line clients to ensure that inputs into Packer templates are correct. Here’s quick example: if you want to confirm that the service principal client_id and client_secret are correct, you might want to add something like this into your pipeline:

#!/usr/bin/env bash
client_id=$1
client_secret=$2
tenant_id=$3

if ! az login --service-principal -u "$client_id" -p "$client_secret" --tenant "$tenant_id"
then
echo "ERROR: Invalid credentials." >&2
exit 1
fi

This will save you at least three minutes during exection…as well as something else that’s a little more frustrating.

The AWS provider and builder are more actively consumed

Both the AWS and Azure Terraform providers and Packer builders are mostly maintained internally by HashiCorp. However, what you’ll find out after using the Azure ARM provider for a short while is that its usage within the community pales in comparison.

I ran into an issue with the azure-arm builder whereby it failed to find a resource group that I created for an image I was trying to build. Locating that resource group with az groups list and the same client_id and secret worked fine, and I was able to find the resource group in the console. As well, I gave the service principal “Owner” permission, so there were no access limitations preventing it from finding this resource group.

After spending some time going into the builder source code and firing up Charles Web Proxy, it turned out that my error had nothing to do with resource groups! It turns out that the credentials I was passing into Packer from my Makefile were incorrect.

What was more frustrating is that I couldn’t find anything on the web about this problem. One would think that someone else using this builder would have discovered this before I did, especially after this builder having been available for at least 6 months since this time of writing.

It also seems that there are, by far, more internal commits and contributors to the Amazon builders than those for Azure, which seem to largely be maintained by Microsoft folks. Despite this disparity, the Azure contributors are quite fast and are very responsive (or at least they were to me!).

Getting Started Is Slightly More Involved on Azure

In the early days of cloud computing, Amazon’s EC2 service focused entirely on VMs. Their MVP at the time was: we’ll make creating, maintaining and destroying VMs fast, easy and painless. Aside from subnets and some routing details, much of the networking overhead was abstracted away. Most of the self-service offerings that Amazon currently has weren’t around, or at least not yet. Deploying an app onto AWS still required knowledge on how to set up EC2 instances and deploy onto them, which allowed companies like Digital Ocean and Heroku to rise into prominence. Over time, this premise seems to have held up, as most of AWS’s other offerings heavily revolve around EC2 in various forms.

Microsoft took the opposite direction with Azure. Azure’s mission statement was to deploy apps onto the cloud as quickly as possible without users having to worry about the details. This is still largely the case, especially if one is deploying an application from Visual Studio. Infrastructure-as-a-Service was more-or-less an afterthought, which led to some industry confusion over where Azure “fit” in the cloud computing spectrum. Consequently, while Microsoft added and expanded their infrastructure offerings over time, the abstractions that were long taken for granted in AWS haven’t been “ported over” as quickly.

This is most evident when one is just getting started with AWS and the HashiCorp suite for the first time versus starting up on Azure. These are the steps that one needs to take in order to get a working Packer image into AWS:

  1. Sign up for AWS.
  2. Log into AWS.
  3. Go to IAM and create a new user.
  4. Download the access and secret keys that Amazon gives you.
  5. Assign that user Admin privileges over all AWS services.
  6. Download the AWS CLI (or install Docker and use the anigeo/awscli image)
  7. Configure your client: aws configure
  8. Create a VPC: aws ec2 create-vpc --cidr-block 10.0.0.0/16
  9. Create an Internet Gateway: aws ec2 create-internet-gateway
  10. Attach the gateway to your VPC so that your machines can Internet: aws ec2 attach-internet-gateway --internet-gateway-id $id_from_step_9 --vpc-id $vpc_id_from_step_8
  11. Create a subnet: aws ec2 create-subnet --vpc-id $vpc_id_from_step_8 --cidr-block 10.0.1.0/24
  12. Update that subnet so that it can issue publicly accessible IP addresses to VMs created within it: aws ec2 modify-subnet-attribute --subnet-id $subnet_id_from_step_11 --map-public-ip-on-launch
  13. Download Packer (or use the hashicorp/packer Docker image)
  14. Create a Packer template for Amazon EBS.
  15. Deploy! `packer build -var ‘access_key=$access_key’ -var ‘secret_key=$secret_key’ your_template.json

If you want to understand why an AWS VPC requires an internet gateway or how IAM works, finding whitepapers on these topics is a fairly straightforward Google search.

Getting started on Azure, on the other hand, is slightly more laborious as documented here. Finding in-depth answers about Azure primitives has also been slightly more difficult, in my experience. Most of what’s available are Microsoft Docs entries about how to do certain things and non-technical whitepapers. Finding a Developer Guide like those available in AWS was difficult.

In Conclusion

Using multiple cloud providers is a smart way of leveraging different pricing schemes between two providers. It is also an interesting way of adding more DR than a single cloud provider can provide alone (which is kind-of a farce, as AWS spans dozens of datacenters across the world, many of which are in the US, though region-wide outages have happened before, albeit rarely.

HashiCorp tools like Terraform and Packer make managing this sort of infrastructure much easier to do. However, both providers aren’t created equal, and the AWS support that exists is, at this time of writing, significantly more extensive. While this certainly doesn’t make using Azure with Terraform or Packer impossible, you might find yourself doing more homework than initially expected!

About Me

IMG-0456-min

I’m a Technical Principal for Contino. We specialize in helping large and heavily-regulated enterprises make cloud adoption and DevOps culture a reality. I’m passionate about bringing DevOps to the enterprise. I’m also passionate about bikes, brews and travel!

Advertisements

Some Terraform gotchas.

So you’ve got a bacon delivery service repository with Terraform configuration files at the ready, and it looks something like this:


$> tree
.
├── main.tf
├── providers.tf
└── variables.tf

0 directories, 3 files

terraform is applying your configurations and saving them in tfstate like you’d expect. Awesome.

Eventually, your infrastructure scales just large enough to necessitate a directory structure. You want to express your Terraform configurations in a way that (a) makes it easy to see what’s in which environment, (b) makes it easy to modify those environments without affecting other environments and (c) prevents your HCL from becoming a total mess not much unlike if you were to do it with Puppet or Chef.

Fortunately, Terraform makes this pretty easy to do…but not without some gotchas.

<

h2>One suggestion: Use modules!

Modules give you the ability to reuse Terraform resources throughout your codebase. This way, instead of having a bunch of aws_instances lying around in your main, you can neatly express them in ways that make more sense:


module "sandbox-web-servers" {
  source = "../modules/aws/sandbox"
  provider = "aws.us-west-1"
  environment = "sandbox"
  tier = "web"
  count = 10
}

When you do this, you need to populate Terraform’s module cache by using terraform get /path/to/module.

<

h2>Gotcha #1: Self variable interpolation isn’t a thing yet.

If you noticed, the example above references “sandbox” quite a lot. This is because, unfortunately, Terraform modules (and resources, I believe) do not yet support self-referencing variables. What I mean is this:


module "sandbox-web-server" {
  environment = "sandbox"
  source = "../modules/${var.self.environment}"
  ...
}

Given that everything in Terraform is a directed graph, the complexity in doing this makes sense. How do you resolve a reference to a variable that hasn’t been defined yet?

This was tracked here, but it looks like a blue-sky feature right now.

Gotcha #2: Module source paths are relative to the module.

Let’s say you had a module definition that looked like this:


module "sandbox-web-servers" {
  source = "modules/aws/sandbox"
}

and a directory structure that looked like this:


$> tree
.
├── infrastructure
│   └── sandbox
│       └── web_servers.tf
└── modules
    └── aws
        └── sandbox
            └── main.tf

5 directories, 2 files

Upon running terraform apply, you’d get an awesome error saying that modules/aws/sandbox couldn’t be located, even if you ran it at the root. You’d wonder why this is given that Terraform is supposed to reference everything from the location from which the application was executed.

It turns out that modules don’t work that way. When modules are loaded with terraform get, their dependencies are sourced from the location of the module. I haven’t looked too deeply into this, but this is likely due to the way in which Terraform populates its graphs.

To fix this, you’ll need to either (a) create symlinks in all of your modules pointing to your module source, or (b) fix your sources to use relative paths relative to the location of the module, like this:


module "sandbox-web-servers" {
  source "../../modules/aws/sandbox"
  ...
}

Gotcha #3: Providers must co-exist with your infrastructure!

This one took me a few hours to reason about. Let’s go back to the directory structure referenced above (which I’ve included again below for your convenience):


$> tree
.
├── infrastructure
│   └── sandbox
│       └── web_servers.tf
└── modules
    └── aws
        └── sandbox
            └── main.tf

5 directories, 2 files

Since you deploy to multiple different sources (nit pick: Nearly every example I’ve seen on Terraform assumes you’re using AWS!), you want to create a providers folder to express this. Additionally, since your infrastructure might be defined differently by environment and you want the thing that’s actually calling terraform to assume as little about your infrastructure as possible, you want to break it down by environment. When I tried this, it looked like this:


.
├── infrastructure
│   └── sandbox
│       └── web_servers.tf
├── modules
│   └── aws
│       └── sandbox
│           └── main.tf
└── providers
    ├── openstack
    ├── colos
    ├── gce
    └── aws
        ├── dev
        │   ├── main.tf
        │   └── variables.tf
        ├── pre-prod
        │   ├── main.tf
        │   └── variables.tf
        ├── prod
        │   ├── main.tf
        │   └── variables.tf
        └── sandbox
            ├── main.tf
            └── variables.tf

14 directories, 10 files

You now want to reference this in your modules:


# infrastructure/sandbox/aws_web_servers.tf
module "sandbox-web-servers" {
  source = "../../modules/aws/sandbox"
  provider = "aws.sandbox.us-west-1" # using a provider alias
  ...
}

and are in for a pleasant surprise when you discover that Terraform fails because it can’t locate the “aws.sandbox.us-west-1” provider.

I initially assumed that when Terraform looked for the nearest provider, it would search the entire directory for a suitable one, in other words, it would follow a search path like this:


- ./infrastructure/sandbox
- ./infrastructure
- .
- ./modules
- ./modules/aws
- ./modules/aws/sandbox
- .
- ./providers
- ./providers/aws
- ./providers/aws/sandbox <-- here

But that’s not what happens. Instead, it looks for its providers in the same location as the module being referenced. This meant that I had to put providers.tf in the same place as aws_web_servers.tf.

I couldn’t even get away with putting it in the directory for its requisite environment above it (i.e. ./infrastructure/aws/sandbox) because Terraform doesn’t currently support object inheritance.

Instead of re-defining my providers in every directory, I created my providers.tf in every infrastructure environment folder I had (which is just sandbox at the moment) and symlinked it in every folder underneath it. In other words:


carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ ln -s ../providers.tf infrastructure/sandbox/aws/providers.tf^C
carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ ls -lart infrastructure/sandbox/aws/
total 0
-rw-rw-rw- 1 carlosonunez carlosonunez  0 Dec  6 23:52 web_servers.tf
drwxrwxrwx 2 carlosonunez carlosonunez  0 Dec  7 00:14 ..
drwxrwxrwx 2 carlosonunez carlosonunez  0 Dec  7 00:14 .
lrwxrwxrwx 1 carlosonunez carlosonunez 15 Dec  7 00:14 providers.tf -> ../providers.tf
carlosonunez@DESKTOP-DSKP2VT:/tmp/terraform$ tree
.
├── infrastructure
│   └── sandbox
│       ├── aws
│       │   ├── providers.tf -> ../providers.tf
│       │   └── web_servers.tf
│       └── providers.tf
├── modules
│   └── aws
│       └── sandbox
│           └── main.tf
└── providers
    ├── aws
    ├── colos
    ├── gce
    └── openstack
        ├── dev
        │   ├── main.tf
        │   └── variables.tf
        ├── pre-prod
        │   ├── main.tf
        │   └── variables.tf
        ├── prod
        │   ├── main.tf
        │   └── variables.tf
        └── sandbox
            ├── main.tf
            └── variables.tf

15 directories, 12 files

It’s not great, but it’s a lot better than re-defining my providers everywhere.

Gotcha #4: Unset your provider env vars!

So the thing in Gotcha #3 never happened to you. It seemed to deploy just fine. That is until you realized you were deploying to the production account instead of the dev, which you were abruptly informed of by Finance when they were wondering why you spun up $15,000 worth of compute. Oops.

This is because of a thoughtful-yet-conveniently-unfortunate side effect of providers whereby (a) most of them support using environment variables to define their behavior, and (b) Terraform has no way of turning this off (an issue I recently raised).

For now, unset boto, openstack, gcloud or whatever provider CLI tool you might be using before running terraform commands. That, or run them in a clean shell using /bin/sh

That’s it!

I’m really enjoying Terraform. I hope you are too! Do you have any other gotchas? Want to leave some feedback? Throw in a comment below!

About Me

20160408

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Enable Linux on Windows the fast way.

Do you have a Windows machine running Windows 10 Anniversary Edition? Do you want to install Ubuntu on that machine so you can have a real Terminal and do real Linux things (Something something DOCKER DOCKER DOCKER something something)? Do you want to do this all through Powershell?

Say no more. I got you.

Start an elevated Powershell session. (Click on the Start button. Type “powershell” into the Search bar. Hit Shift then Enter. Click “Ok.”) Copy and paste this into it. Restart your machine. Enjoy Linux on Windows. What a time to be alive.

# Create AppModelUnlock if it doesn't exist, required for enabling Developer Mode
 $RegistryKeyPath = "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\AppModelUnlock"
 if (-not(Test-Path -Path $RegistryKeyPath)) {
 New-Item -Path $RegistryKeyPath -ItemType Directory -Force
 }

# Add registry value to enable Developer Mode
 New-ItemProperty -Path $RegistryKeyPath -Name AllowDevelopmentWithoutDevLicense -PropertyType DWORD -Value 1

# Enable the Linux subsystem
 Get-WindowsOptionalFeature -Online | ?{$_.FeatureName -match "Linux"} | %{ Enable-WindowsOptionalFeature -Online -FeatureName $_.FeatureName}
 Restart-Computer -Force

# Install Ubuntu
 # Start an elevated Powershell session first
 lxrun /install /y
 lxrun /setdefaultuser <username that you want>

# Start it!
 bash

Suggestions

  • Install Chocolatey. It’s a package manager for Windows. It’s damn good. You can write your own packages too.
  • Install ConsoleZ: choco install consolez. It’s the best.
  • Install gvim: choco install gvim.
  • Install vcxsrv (the new xming, now with an even more abstract name!): choco install vcxsrv
  • Put Set-PSReadLineOption -EditMode Emacs into your profile: vim $PROFILE. Enjoy emacs keybindings for your Powershell session.
  • You can forward X11 applications to Windows! Prefix your application with DISPLAY:=0 after installing and starting vcxsrv. Speed is fine; it’s a lot faster than doing it over SSH (as expected since Ubuntu is running under a Windows subsystem and these syscalls are abstracted by Window syscalls).

About Me

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Winning at Ansible: How to manipulate items in a list!

The Problem

Ansible is a great configuration management platform with a very, very extensible language for expressing yoru infrastructure as code. It works really well for common workflows (deploying files, adding authorized_keys, creating new EC2 instances, etc), but its limitations become readily apparent as you begin embarking in more custom and complex plays.

Here’s a quick example. Let’s say you have a playbook that uses a variable (or var in Ansible-speak) that contains a list of tables, like this:

important_files:
- file_name: ssh_config
file_path: /usr/shared/ssh_keys
file_purpose: Shared SSH config for all mapped users.
- file_name: bash_profile
file_path: /usr/shared/bash_profile
file_purpose: Shared .bash_profile for all mapped users.

(You probably wouldn’t manage files in Ansible this way, as it already comes with a fleshed-out module for doing things with files; I just wanted to pick something that was easy to work with for this post.)

If you wanted to get a list of file_names from this var, you can do so pretty easily with set_fact and map:

- name: "Get file_names."
set_fact:
file_names: "{{ important_files | map(attribute='file_name') }}"

This should return:

[ u'/usr/shared/ssh_keys', u'/usr/shared/bash_profile' ]

However, what if you wanted to modify every file path to add some sort of identifier, like this:

[ u'/usr/shared/ssh_keys_12345', u'/usr/shared/bash_profile_12345' ]

The answer isn’t as clear. One of the top answers for this approach suggested extending upon the map Jinja2 filter to make this happen, but (a) I’m too lazy for that, and (b) I don’t want to depend on code that might not be on an actual production Ansible management host.

The solution

It turns out that the solution for this is more straightforward than it seems:

- name: "Set file suffix"
set_fact:
file_suffix: "12345"

- name: &quot;Get and modify file_names.&quot;
set_fact:
file_names: "{{ important_files | map(attribute='file_name') | list | map('regex_replace','(.*)','\\1_{{ file_suffix }}') | list }}"

Let’s break this down and explain why (I think) this works:

  • map(attribute='file_name') selects items in the list whose key matches the attribute given.
  • list casts the generated data structure back into a list (I’ll explain this below)
  • map('regex_replace','$1','$2') replaces every string in the list with the pattern given. This is what actually does what you want.
  • list casts the results back down to a list again.

The thing that’s important to note about this (and the thing that had me hung up on this for a while) is that every call to map (or most other Jinja2 filters) returns the raw Python objects, NOT the objects that they point to!

What this means is that if you did this:

- name: "Set file suffix"
set_fact:
file_suffix: "12345"

- name: "Get and modify file_names."
set_fact:
file_names: "{{ important_files | map(attribute='file_name') | map('regex_replace','(.*)','\\1_{{ file_suffix }}') }}"

You might not get what you were expecting:

ok: [localhost] => {
    "msg": "Test - <generator object do_map at 0x7f9c15982e10>."
}

This is sort-of, kind-of explained in this bug post, but it’s not very well documented.

Conclusion

This is the first of a few blog posts on my experiences of using and failing at Ansible in real life. I hope that these save someone a few hours!

About Me

Carlos Nunez is a site reliability engineer for Namely, a modern take on human capital management, benefits and payroll. He loves bikes, brews and all things Windows DevOps and occasionally helps companies plan and execute their technology strategies.

Concurrency is a terrible name.

I was discussing the power of Goroutines a few days ago with a fellow co-worker. Naturally, the topic of “doing things at the same time in fancy ways” came up. In code, this is usually expressed by the async or await keywords depending on your language of choice. I told him that I really liked how Goroutines abstracts much of the grunt work in sharing state across multiple threads. As nicely as he possibly could, he responded with:

You know nothing! Goroutines don’t fork threads!

This sounded ludicrous to me. I (mistakenly) thought that concurrency == parallelism because doing things “concurrently” usually means doing them at the same time simultaneously, i.e. what is typically described as being run in parallel.
Nobody ever says “I made a grilled cheese sandwich in parallel to waiting for x.” So I argued how concurrency is all about multithreading while he argued that concurrency is all about context switching. This small, but friendly, argument invited a few co-workers surrounding us, and much ado about event pumps were made.

After a few minutes of me being proven deeply wrong, one of our nearby coworkers mentioned this tidbit of knowledge:

Concurrency is a terrible name for this.

I couldn’t agree more, and my small post will talk about why.

In computer science, concurrency is the term used to describe the state in which multiple things are done at the same time within the same “thread” of execution. In contrast, parallelism is used to describe the state in which multiple things are done at the same time across multiple “threads” of execution.
The biggest difference between the two is being able to do multiple units of work simultaneously across multiple processors.

“What about multithreading,” you might ask. “I thought that the whole point of doing things across multiple threads was to do multiple things at once!”

Here’s the thing: today’s processors can only do things one instruction at a time. The massive amount of engineering, silicon and transistors that they have are built to execute one instruction at a time really really really quickly and accurately. What gets executed and when is up to the operating system queueing up work for the processor to do. Operating systems deal with this by giving every process (and their threads) a pre-defined amount of time with the processor called a time slice or quantum.

The processor is even processing instructions when the operating system has nothing for it to do; these instructions are called NOOPs in x86 assembly. (Fun fact: whenever you open up Task Manager or Activity Monitor and see the % of CPU being used, what you’re actually looking at is the ratio of instructions being executed to NOOPs.) Process scheduling is quite the loaded topic that I’m almost certain that I’m not doing justice to; if you’re interested in learning more about it, these slides from an operating systems course from UC Davis describe this really well.

Even though operating systems typically schedule work from processes to be done serially on one processor, the programmer
can tell it to divide the work amongst multiple or all processors on the system. So instead of work from this process being done one instruction at a time, it can be done n instructions at a time, where n is the number of processors installed on a system. What’s more is that since most operating systems typically slam the first processor for everything, processes that take advantage of this can typically get more done faster since they are not competing for as time on the main processor. This approach is called symmetric multiprocessing, or SMP, and Windows has supported it since Windows NT and Linux since 2.4. In other words, this is nothing new.

To make matters more confusing, these days, operating systems will often automatically schedule threads across multiple processors automatically if the application uses multiple threads, so for practicality’s sake, concurrent programming == parallel programming.

TL;DR

Concurrency and parallelism aren’t the same, except when they are. Sort of.

About Me

Carlos Nunez is a site reliability engineer for Namely, a human capital management and payroll solution made for humans. He loves bikes, brews and all things Windows DevOps and occasionally helps companies plan and execute their technology strategies.

for vs foreach vs “foreach”

Many developers and sysadmins starting out with Powershell will assume that this:

$arr = 1..10
$arr2 = @()
foreach ($num in $arr) { $arr2 += $num + 1 }
write-output $arr2

is the same as this:

$arr = 1..10
$arr2 = @()
for ($i = 0; $i -lt $arr.length; $i++) { $arr2 += $arr[$i] + $i }
write-output $arr2

or this:

$arr = 1..10
$arr2 = @()
$arr | foreach { $arr2 += $_ + 1 }

Just like those Farmers Insurance commercials demonstrate, they are not the same. It’s not as critical of an error as, say, mixing up Write-Output with Write-Host (which I’ll explain in another post), but knowing the difference between the two might help your scripts perform better and give you more flexibility in how you do certain things within them.

You’ll also get some neat street cred. You can never get enough street cred.

for is a keyword. foreach is an alias…until it’s not.

Developers coming from other languages might assume that foreach is native to the interpreter. Unfortunately, this is not the case if it’s used during the pipeline. In that case, foreach is an alias to the ForEach-Object cmdlet, a cmdlet that iterates over a collection passed into the pipeline while keeping an enumerator internally (much like how foreach works in other languages). Every PSCmdlet incurs a small performance penalty relative to interpreter keywords as does reading from the pipeline, so if script performance is critical, you might be better off with a traditional loop invariant.

To see what I mean, consider the amount of time it takes foreach and for to perform 100k loops (in milliseconds):

PS C:> $st = get-date ; 1..100000 | foreach { } ; $et = get-date ; ($et-$st).TotalMilliseconds
2761.4339

PS C:> $st = get-date ; for ($i = 0 ; $i -lt 100000; $i++) {} ; $et = get-date ; ($et-$st).TotalMilliseconds
**279.2439**
PS C:> $st = get-date ; foreach ($i in (1..100000)) { } ; $et = get-date ; ($et-$st).TotalMilliseconds
**128.1159**

for was almost 10x faster, and the foreach keyword was 2x as fast as for! Words do matter!

foreach (the alias) supports BEGIN, PROCESS, and END

If you look at the help documentation for ForEach-Object, you’ll see that it accepts -Begin, -Process and -End script blocks as anonymous parameters. These parameters give you the ability to run code at the beginning and end of pipeline input, so instead of having to manually check your start condition at the beginning of every iteration, you can run it once and be done with it.

For example, let’s say you wanted to write something to the console at the beginning and end of your loop. With a for statement, you would do it like this:

$maxNumber = 100
for ($i=0; $i -lt $maxNumber; $i++) {
if ($i -eq 0) {
write-host "We're starting!"
}
elseif ($i -eq $maxNumber-1) {
write-host "We're ending!"
}
# do stuff here
}

This will have the interpreter check the value of $i and compare it against $maxNumber twice before doing anything. This isn’t wrong per se but it does make your code a little less readable and is subject to bugs if the value of $i is messed with within the loop somewhere.

Now, compare that to this:

1..100 | foreach `
-Begin { write-host "We're starting now" } `
-Process { # do stuff here } `
-End { write-host "We're ending!" }

Not only is this much cleaner and easier to read (in my opinion), it also removes the risk of the initialization and termination code running prematurely since BEGIN and END always execute at the beginning or end of the pipeline.

Notice how you can’t do this with the foreach keyword:

PS C:\> foreach ($i in 1..10) -Begin {} -Process {echo $_} -End {}
At line:1 char:22
+ foreach ($i in 1..10) -Begin {} -Process {echo $_} -End {}
+ ~
Missing statement body in foreach loop.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : MissingForeachStatement

In this case, foreach has no concept of BEGIN, PROCESS or END; it’s just like the foreach you’re used to using with other languages.

About Me

I’m the founder of caranna.works, an IT engineering firm in Brooklyn that builds smarter and cost-effective IT solutions that help new and growing companies grow fast. Sign up for your free consultation to find out how. http://caranna.works.

Technical Thursdays: DNS, or why using the Internet is kind of like going to Starbucks

This Thursday, we’ll talk about a system that has been extremely critical (and extremely taken for granted) for shaping the Internet as we know it: the domain name system, or DNS for short.

Before I explain what DNS is, I’ll talk about something I try really hard to hate but ultimately can’t: Starbucks.

I go to Starbucks at least once a day. Given that Google has more coffee machines (and baristas!) sitting idle than my handy downstairs Starbucks does on even their busiest days, this is slightly embarrassing to admit. I love their drinks, but as a recovering coffee snob, I passive-aggressively hate that I love their drinks. My relationship with that Seattle staple is kind of like how a lot of people feel about Taylor Swift: they’ll hate on her forever but will never admit to playing 1989 on repeat.

Wait, that’s just me?

Okay. I can live with that.

Anyway, what I find fascinating about Starbucks aside from their many variants of non-coffee coffee drinks (that are so good but so bad) is how baristas communicate drinks to each other. Somehow, someway, your order for a tall caramel-flavored latte with soy milk, whip cream and a double-shot of espresso is always a tall caramel whip redeye latte to every Starbucks barista on the planet, but trying that on a barista at Cafe Grumpy will usually get you banned for life.

What’s even more fascinating about this is that DNS works “exactly” the same way when you go to BuzzFeed.com on your phone or computer to endlessly browse lists of cat pictures and gifs of people doing funny things.

(Don’t pretend like you don’t.)

You probably know that underneath the the lists and relationship videos, BuzzFeed is really a ton of servers doing lots of hard work to deliver this quality content, and buzzfeed.com is just one of the servers that shows them to you.

What you might not know is that the name of that server isn’t buzzfeed.com; it’s actually: 54.241.35.79. That’s it’s IP address.

If you type in those four (or eight) numbers into Chrome (or whatever your browser of choice is; I use Safari for reasons that won’t be discussed here to avoid an intense holy war), it’ll take you right to BuzzFeed.

How does your computer know that these two things go to the same place? The answer is DNS.

What Is This DNS Magic That You Speak Of?

DNS is a system that maps names like buzzfeed.com or Wikipedia.org to IP addresses. It was created in the early 1980s when the Internet was much much MUCH smaller and has been iterated and improved upon significantly since then. Here’s the original RFC that describes how it works, and surprisingly, a lot of it has held up over time!

These mappings are stored in records. There are several kinds of them. The name-to-IP mapping that I described earlier is stored in an A record, but a DNS can also have records for other mappings to things like shortcuts to A records (CNAME records), mail servers on the network to which that IP address belongs (MX records) or random data (TXT records).

When your computer attempts to find the IP address for a web site, its DNS client (also called a resolver) performs a DNS query. The response it gets back is the DNS response.

So original, I know.

Dots and zones

The dots in a website URL are very important. Every word behind each dot is called a DNS domain, and every one of those words maps to something.

The last word in the URL, i.e. the .com, .org and .football, is called a top-level domain or TLD. Every single one is maintained by the Internet Assigned Numbers Authority, or the IANA. In the early days of simple Internet, this used to give you an idea of what the website was for. .coms were for commercial use or companies, .orgs were for non-profits and foundations, .net were for personal websites and country-specific TLDs like .us or .it were for government-run websites.

However, like most things from that time period, that’s gone completely out the window (do you think bit.ly is in Libya?).

Records within a DNS are broken up into zones, and servers within the DNS are responsible for upholding their zone. These zones are usually HUGE text files that get stored completely within that server’s memory for really fast access. When your computer sends a DNS query, the DNS server you’re configured to use will ask for this server if it doesn’t have the record it’s looking for stored anywhere. It does this by asking for a special record called the State-Of-Authority, or SOA, which tells it where to go next in its search.

DNS is so hot right now

Almost every single web site you’ve visited within the last 20 years or so has likely taken advantage of DNS. If you’re like me, that’s probably a lot of websites! Furthermore, many of the assets on those web sites (think: images and code for all of those fancy site effects) are referred to by name and resolved by DNS.

The Internet as we know it would not function without DNS. As of yesterday, the size of the entire Internet was just over 1 BILLION unique web sites (and growing! exponentially!) and used by over 3 BILLION people.

Now imagine all of that traffic being handled by a single Dell server somewhere in this vast sea of Internet.

You can’t? Good. Me neither.

DNS at WEB SCALE

So how does DNS manage to work for all of these people for all of these web sites? When it comes to matters of scale, the answer is usually: throw a metric crap ton of servers at it.

DNS is no exception.

The Root

There are a few layers of servers involved in your typical DNS query. The first and top-most layer starts at the DNS root servers. These servers are ran by the Internic and are used to tell you which servers own what TLDs (see below).

There are 13 root servers throughout the world, {A through M}.root-servers.net. As you can imagine, they are very, very, very powerful clusters of servers.

The TLD companies

Every TLD is managed by a company. The DNS servers run by these companies contain the records for every website that uses those TLDs. In the case of bit.ly, for example, the records for bit.ly will live on a DNS server managed by the IANA, whereas the records for stupidsiteabout.football will be managed by Donuts.

Whenever you buy a domain with GoDaddy, (a) you are doing yourself a disservice and need to get on Gandi or Hover right now, and (b) your payment gives you the ability to create records that eventually land up on these servers.

The Public Servers

The next layer of servers in the query are the public DNS servers. These are usually hosted by either your ISP, Google or DNS companies like Dyn or OpenDNS, but there are MANY DNS servers available out there. These are almost always the DNS servers that you use on a daily basis.

While they usually have the same set of records that the root servers have, they’ll refer to the root servers above if they’re missing anything. Also, because they are used more frequently than the root servers above, they are often more susceptible to people doing bad things, so the good DNS servers will implement lots of security enhancements to prevent these things from happening. Finally, the really big DNS services usually have MANY more servers available than the root servers, so your query will always be responded to quickly.

Your Dinky Linksys

The third layer of servers involved in the queries most people make aren’t actually servers at all! Your home router most likely runs a small DNS server to help make responses to queries a lot faster. They don’t store a lot of records, and they are typically written pretty badly, so I often reconfigure these routers for my clients so that use Google or OpenDNS instead.

Your job probably has DNS servers of their own to improve performance and also upkeep internal and private records.

Your iPhone

The final layer of a query ends (well, starts) right at your phone or computer. Your computer’s DNS resolver will often store responses to common queries for a short period of time to avoid having to use DNS servers as often as possible.

While this is often a very good thing, this often causes problems when records change. If you’ve ever tried to go onto a website and were unable to, this is often one reason why. Fortunately, fixing this is as simple as clearing your DNS cache. In Windows, you can do this by clicking Start, then typing cmd /c ipconfig /flushdns into your search bar. Use these instructions to do this on your Mac or these instructions to do this on your iPhone or iPad.

This is starting to get long and I’m in the mood for a caramel frap now, so I’m going to stop while I’m ahead here!

Did you learn something today? Did I miss something? Let me know in the comments!

Technical Tuesdays: Powershell Pipelines vs Socks on Amazon

In Powershell, a typical, run-of-the-mill pipeline looks something like this:

Get-ChlidItem ~ | ?{$_.LastWriteTime -lt $(Get-Date 1/1/2015)} | Format-List -Auto

but really looks like this when written in .NET (C# in this example):

Powershell powershellInstance = new Powershell()
RunspaceConfiguration runspaceConfig = RunspaceConfiguration.Create()
Runspace runspace = RunspaceFactory.CreateRunspace(runspaceConfig)
powershellInstance.Runspace = runspace
try {
    runspace.Open();
    IList errors;

    Command getChildItem = new Command("Get-ChildItem");
    Command whereObjectWithFilter = new Command("Where-Object");

    ScriptBlock whereObjectFilterScript = new ScriptBlock("$_.LastWriteTime -lt $(Get-Date 1/1/2015)");
    whereObjectFilter.Parameters.Add("FilterScript", $whereObjectFilterScript);

    Command formatList = new Command("Format-List");
    formatList.Parameters.Add("Auto", "true");

    Pipeline pipeline = runspace.CreatePipeline();
    pipeline.Commands.Add(getChildItem);
    pipeline.Commands.Add(whereObjectFilter);
    pipeline.Commands.Add(formatList);

    Collection results = pipeline.Invoke(out errors)
    if (results.Count & gt; 0) {
        foreach(result in results) {
            Console.WriteLine(result.Properties["FullName"].toString());
        }
    }
} catch {
    foreach(error in errors) {
        PSObject perror = error;
        if (error != null) {
            ErrorRecord record = error.BaseObject as ErrorRecord;
            Console.WriteLine(record.Exception.Message);
            Console.WriteLine(record.FullyQualifiedErrorId);
        }
    }
}

Was your reaction something like:

WUT

Yeah, mine was too.

Let’s try to break down what’s happening here in a few tweets.

Running commands in Powershell is very much like buying stuff from Amazon. At a really high level, you can think of the life of a command in Powershell like this:

  • You’re in the mood for fancy socks and go to Amazon.com. (This would be equivalent to the runspace in which Powershell commands are run.)

  • You find a few pairs that you like (most of them fuzzy and warm) and order them. (This would be the cmdlet that you type into your Powershell host (command prompt).)

  • Amazon finds those socks in their massive warehouse and begins packaging them. (This is akin to finding the definition of Get-Command in a .NET library loaded into your runspace and, when found, wrapping it into a Command object, with the fuzziness and color of those socks being its Parameter properties.)

  • Amazon then puts that package into a queue in preparation for shipment. (In Powershell, this would be like adding the Command into a Pipeline.)

  • Amazon ships your super fuzzy socks when ready. (Pipeline.Invoke()).

  • You open the box the next day (you DO have Prime, right?!) and enjoy your snazzy feet gloves. (The results of the Pipeline get written to the host attached to its runspace, which in this case would be the Powershell host/command prompt.)

  • If Amazon had issues getting the socks to you, you would have gotten an email of some sort with a refund + free money and an explanation of what happened (In Powershell, this is known as an ErrorRecord.)

And that’s how Microsoft put the power of Amazon on your desktop!

Has the Powershell pipeline ever saved your life? Have you ever had to roll your own runspaces and lived to talk about it? (Did you know you can use runspaces to make multithreaded Powershell scripts? Not saying that *you would…) Let’s talk about it in the comments below!*

About Me

I’m the founder of caranna.works, an IT engineering firm in Brooklyn, NY that employs time-tested and proven solutions that help companies save lots of money on their IT costs. Sign up for your free consultation to find out how. http://caranna.works.