Provisioning VMware Workstation Machines from Artifactory with Vagrant

I wrote a small
Vagrantfile
and helper library for provisioning VMware VMs from boxes hosted on Artifactory. I put this together with the intent of helping us easily provision our Rancher/Cattle/Docker-based platform wholesale on our machines to test changes before pushing them up.

Here it is: https://github.com/carlosonunez/vagrant_vmware_artifactory_example

Tests are to be added soon! I’m thinking Cucumber integration tests with unit tests on the helper methods and Vagrantfile correctness.

I also tried to emphasize small, isolated and easily readable methods with short call chains and zero side effects.

The pipeline would look roughly like this:

  • Clone repo containing our Terraform configurations, cookbooks and this Vagrantfile
  • Make changes
  • Do unit tests (syntax, linting, coverage, etc)
  • Integrate by spinning up a mock Rancher/Cattle/whatever environment with Vagrant
  • Run integration tests (do lb’s work, are services reachable, etc)
  • Vagrant destroy for teardown
  • Terraform apply to push changes to production

We haven’t gotten this far yet, but this Vagrantfile is a good starting point.

Configuration management and provisioning are different.

Configuration management tools are used to repeatably and consistently system and application uniformity across clusters of systems at scale. Many of these tools achieve this in three ways: an intuitive command line interface, a lightweight and easily-readable domain-specific language and a comprehensive REST-based API to lower the barrier-to-entry for integrations with other tools. While open-source configuration management tools such as Chef, Ansible, Puppet and Salt have been increasing in popularity over the years, there are also enterprise-grade and regulator-friendly offerings available from vendors such as Dell, Microsoft, HP and BMC.

Configuration management tools are great at keeping a running inventory of existing systems and applications up-to-date. These tools are so good at this, in fact, that many systems administrators and engineers grow tempted into using them to deploy swaths of new systems and configure them shortly thereafter.

I’ve seen this play out at many companies that I’ve worked at. This would usually manifest into an Ansible deployment playbook or a Chef cookbook that eventually became “the” cookbook. The result has always been the same, and if I had to sum this pattern up into a picture, it would look something like this:

i-swear-this-worked-on-my-machine

Let me explain.

Complexity in simplicity.

One of the darling features of modern configuration management tools is its ability to express complex configuration states in an easily-readable way. This works well for creating an Ansible playbook to configure, say, an nginx instance, but begins to fall apart when trying to, say, provision the instances on which those nginx instances will be hosted and gets really ugly when you attempt to create relationships to deploy application servers with those web servers in trying to stage an environment.

Creating common templates for security groups or firewall templates, instances, storage and the like with configuration management tools outside of what they provide out of the box usually involves writing a lot of boilerplate beforehand. (For example, Ansible has a plugin for staging ebs volumes, but what if you want an EBS resource with specific defaults for certain web or application servers in certain regions? Prepare for a lot of if blocks.) Problems usually also crop up in passing metadata like instance IDs between resources. Because most of the actions done by configuration management tools are idempotent executions, the simple languages they use to to describe configurations don’t natively support variables. Storing metadata and using loops are usually done by breaking out of the DSL and into its underlying language. This breaks readability and makes troubleshooting more complicated.

Provisioning is hard.

Provisioning infrastructure is usually more complicated than configuring the software underlying that infrastructure in two ways.

The first complication arises from the complicated relationships between pieces of infrastructure. Expressing specific environmental nuances of a postgres installation is usually done by way of Chef cookbook attributes and flags. An example of this can be found here. Expressing the three different regions that the databases backing your web app need to be deployed to in a particular environment and the operating system images that those instances need to have will likely require separate layers of attributes: one for image mappings, another for instance sizing and yet another for region mapping. Expressing this in cookbooks gets challenging.

The second complication comes from gathering state. Unless your infrastructure is completely immutable (which is ideal), some state of your infrastructure in the status quo is required before deploying anything. Otherwise, you’ll be in for a surprise or two after you deploy the servers for that environment you thought didn’t exist. Tools like Terraform and AWS CloudFormation keep track of this state to prevent these situations from happening. Chef or Puppet, for example, do not. You can use built-in resources to capture this data and make decisions based on those results, but that puts you back into manipulating their DSLs to do things they weren’t intended to do.

Rollbacks are harder.

chef-provisioning and Ansible provisioning plugins do not support rolling back changes if something fails. This is problematic for three reasons:

  1. Inconsistent environments lead to increased overhead and (usually manual) sysadmin toil. Toil leads to technical debt, and debt leads to slower releases and grumpy teams. Nobody wants to have a grumpy team.
  2. In the cloud, nearly everything costs money. Resources that you deployed during tests that weren’t destroyed afterwards can add up to hefty surprises at the end of your billing cycle.
  3. Cookbook recipes and playbooks will need to account for these stale resources when executing their actions. This can lead to a more complicated codebase and a debt to pay back later on.

Provisioning tools such as Terraform, Cloudformation and Vagrant support rollback out of the box.

Use the right tools.

If you’re staring at a behemoth of a playbook for provisioning your stack or looking to make the move away from chef-provisioning, take a look at XebiaLabs awesome list of tools that make provisioning less complicated. CloudFormation is awesome at provisioning AWS infrastructure (unless you dislike JSON, in which case it is far from it), Vagrant is great at doing the same for physical infrastructure and Packer does a great job of building images as code.

Good luck!

About Me

20160408

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Making sense of this ChatOps thing

So I’m still not entirely sold on the urgency or importance of “chatops.”

I’m a huge fan of Google Assistant neé Now. I wish that I could replace Siri with it daily. It can answer nearly any question you throw at it, and it is smart enough to do contextual things that resemble conversations. For fun, I just asked Siri to navigate me to my favorite winery from Lewisville, TX to Grapevine, TX, Messina Hof while away. Here’s what it came back with:

siri-fail

Not very useful. What’s a Messina?

Google Assistant, on the other hand, knows what’s up…kind of:

google-win

It didn’t get me to the Grapevine location my fiancée and I always go to, but it (a) knew I was talking about Messina Hof, and (b) navigated me to their biggest vineyard in Bryan, TX (a.k.a Aggieland, opinions notwithstanding).

Here’s the thing, though: in almost every case, I will probably open Google Maps and search for the location there. I’m sure that, in the near future, Assistant will be knowledgable enough to know the exact location I want and whether I should stop for gas and a coffee on the way there (Google’s awesome new phone will probably help accelerate that). In the present, however, it’s a lot faster to do all of that from the app.

Which kind of explains my issue with chatops.

What’s ChatOps?

PagerDuty (awesome on-call management app, highly recommend) explains that, holistically, chatops:

…is all about conversation-driven development. By bringing your tools into your conversations and using a chat bot modified to work with key plugins and scripts, teams can automate tasks and collaborate, working better, cheaper and faster.

Since this is DevOps and that definition wouldn’t be complete without referring to tooling of some sort, remember this?

aol_bots

Think that, but with your infrastructure, more Slack, more modern Web and fewer early 2000s nostalgia:

original

The overall goal of chatops is to use communication mediums that we take advantage of on a daily basis to manage workflows and infrastructure more seamlessly. (To me, email automation would not only squarely fit in with this design pedagogy, but, as discussed later, would also probably be the most compatible and far-reaching solution for people.)

I’m not saying ChatOps isn’t awesome.

There are several frameworks out there that enable companies and teams to start playing around. Hubot, by Github, is the most well-known one. It works with just about every messaging platform out there, including Lync if you have a XMPP gateway set up. Slack integrations and webhooks are also very popular for companies using that product. When implemented correctly, chatops can be quite powerful.

Being able to say phrases like /deploybot deploy master of <project> to preprod or /beachbot create a sandbox environment for myawesometool from carlosnunez’s fork on Slack or Jabber and action on them would be incredibly neat, not to mention incredibly fast. This can be immensely valuable in several high-touch situations such as troubleshooting unexpected issues with infrastructure or automating product releases from a common tool.

More mature implementations can go much, much deeper than that.

44-livingston-blog-post-image-20160601174608

I listened to an extremely interesting episode of Planet Money recently that explained an interesting period of growth for Subaru in the late 1990s to early 2000s. Subaru was struggling to compete with booming Japanese automakers at the time. They were producing cheaper cars faster and were successful in aggressively targetting the mid-market that Subaru classically did well in. Growth eventually went negative, and morales plummeted with it.

In the late 1990s, they made a discovery while trying to find a modicum of success with what they currently had. They discovered that out of their entire lineup of products, only one was selling consistently: the Impreza. They sought to find out why.

What they found was surprising. They saw that this car, and only this car, had a strong positive correlation with female buyers, specifically females that lived together. So they, with the help of Mulryan/Nash, their ad agency, tried something rash: they aimed to exclusively target homosexual couples in almost all of their ad campaigns.

Their sales soared. In fact, they were the only auto manufacturer to generate revenue during the 2008 Global Financial Crisis.

(Check out the full story here if you’re interested in learning more!)

Wouldn’t it have been awesome if they had bots that scoured sales demographics data from their network of dealerships and turn the identified trends covered within into emails or chats that marketing or sales managers can parse and make these same decisions on? How much faster do you think they would have been able to identify this and action on it? How many other trends could they have uncovered and made potential sales on?

That’s what I think when I hear about ChatOps. But let’s get back to reality.

I’m saying that it’s just not that crucial.

There are a lot of things that have to be done “right” before chatops can work. Monitoring and alerting have to be on point, especially for implementing things like automated alert or alarm bots. Creating new development environments have to be automated or at least have a consistent process from which automation can occur. Configuration management has to exist and has to be consistent for deployment bots to work. The list goes on.

Here in lies the rub: for engineers, accomplishing these things from a command-line tool is just as simple, and developers and engineers tend to spend just as much time with their tools as their IM client. Furthermore, implementing new systems introduces complexity, so introducing chatops to an organization when their tooling needs improvement will usually lead to my Messina-that-isn’t-Messina Hof situation from before where the quality of both toolsets ultimately suffers. So if the goal of implementing chatops is to make engineering’s life easier (or to make it easier for non-technical people to gain more understandable views into their tech), there might be easier and more important wins to be had first.

It’s not the end-all-be-all…yet.

Financial companies, tech-friendly law firms and news organizations use chatops to help model the state of markets, find trends in big law to identify new opportunities and uncover breaking news to broadcast around the world. The intrinsic value of ChatOps is definitely apparent.

That said, the foundation of the house comes first. Infrastructure, process and culture have to be solid and at least somewhat automated before chatops can make sense.

About Me

20160408

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Driving technical change isn’t always technical

Paperful office

Locked rooms full of potential secrets was nothing new for a multinational enterprise that a colleague of mine consulted for a few years ago. A new employee stumbling upon one of these rooms, however, was.

What that employee found in his accidental discovery was a bit unusual: a room full of boxes, all of which were full of neatly-filed printouts of what seemed like meeting minutes. Curious about his new find, he asked his coworkers if they knew anyting about this room.

None did.

It took him weeks to find the one person that had a clue about this mysterious room. According to her, one team was asked to summarize their updates every week, and every week, someone printed them out, shipped it to the papers-to-the-metaphoric-ceiling room and categorized it.

Seems strange? This fresh employee thought so. He sought to find out why.

After a few weeks of semi-serious digging, he excavated the history behind this process. Many, many years ago (I’m talking about bring-your-family-into-security-at-the-airport days), an executive was on his way to a far-away meeting and remembered along the way that he forgot to bring a summary of updates for an important team that was to come up in discussion. Panicked, he asked his executive assistant to print it out and bring it to him post haste. She did.

To prevent this from happening again, she printed and filed this update out every week in the room that eventually became the paper jungle gym. She trained her replacement to do this, her replacement trained her replacement; I think you see where this is headed. The convenience eventually became a “rule,” and because we tend to be conformant in social situations, this rule was never contested.

None of those printed updates in that room were ever used.


This has nothing to do with DevOps.

Keep reading.

I’m not sure of what became of that rule (and neither does my colleague). There is one thing I’m sure of, though: tens of thousands of long-lived companies of all sizes have processes like these. Perhaps your company’s deployments to production depend on an approval from some business unit that’s no longer involved with the frontend. Perhaps your company requires a thorough and tedious approval process for new software regardless of its triviality or use. Perhaps your team’s laptops and workstations are locked down as much as a business analyst who only uses their computers for Excel, Word and PowerPoint. (It’s incredible what they can do. Excel itself is a damn operating system; it even includes its own memory manager.)

Some of the simplest technology changes you can make to help your company go faster to market don’t involve technology at all. If you notice a rule or process that doesn’t make sense, it might be worth your while to do your own digging and question it. More people might agree with you than you think.

About Me

I’m a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. I love everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel. I’m on twitter @easiestnameever and LinkedIn at @carlosindfw.

Config management and cloud provisioning: There be dragons

So I’ve tried using configuration management to deploy infrastructure to two different clouds and learned this: whenever you think “it would be great if we could deploy to EC2 with Chef,” use CloudFormation or Terraform instead.

Why? Here are a few reasons that come to mind:

  • CloudFormation/Terraform is easier. Terraform YAML is nicer than CloudFormation JSON, but both are *way* easier than trying to shoehorn Jinja2 (Ansible) or chef-provisioning Ruby to do what you want. Like, hundreds of lines easier.

    I once tried to use Ansible to automate provisioning of Active Directory forests onto EC2. I had to create my own roles for handling AMI selection, security group CRUD operations, EBS provisioning, etc. The 2000+ lines of YAML I wrote to uphold all that bass ultimately became about 200 lines of ugly, yet functional, CloudFormation JSON.

    Yeah.
  • Built-in rollback is awesome. CloudFormation and Terraform both support some kind of rollback. Chef provisioning does as well with the :rollback action (I don’t think Ansible does; at least it didn’t when I used the EC2 plugin), but it’s not guaranteed.
  • I really liked the CloudFormation API. I haven’t tried Terraform’s CLI yet, but I would imagine that it’s just as awesome. aws cloudformation provides a lot of useful information that’s easy to action upon in a Chef recipe or Ansible play, especially given that both platforms have support for CloudFormation “built-in.” What’s better, the AWS SDKs have full support for CloudFormation as well, which means…
  • You’re not locked into anything. This was the biggest takeaway from my experiences using chef-provisioning or ansible-ec2. If you ever decide to move away from Chef or Ansible, you’ll need to port over your deployment code with it. Depending on the platform, this could take anywhere from hours to weeks.

    Not a problem with CloudFormation or Terraform. Perhaps you’ll need to change how your Chef shell resource behaves, but that’s a lot easier to deal with, in my opinion.

Using your config management solution to do it all is really attractive. It’s usually not a bad idea either. However, when it comes to cloud, tread carefully!

About Me

Carlos Nunez is a DevOps consultant for ThoughtWorks, a software company striving for engineering excellence and a better world for our next generation of thinkers and leaders. He loves everything DevOps, Windows, and Powershell, along with a bit of burgers, beer and plenty of travel.

Follow him on Twitter! @easiestnameever.