May 15, 2015 Our CFSummit 2015 talk & highlights - How we deploy all things BOSH
Let's skip 5 minutes of warmup and jump in at "The Problem: The next 59 months"
Everyone gets very excited about deploy something, at all. Like Docker people. "Look it comes up in 30 milliseconds!" You've never created a production problem so quickly.
If a system lasts for 5 years or 60 months then I find the next 59 months the most fascinating. In part because BOSH can do it and no one else's infrastructure orchestration tool can.
I then discuss one of the challenges of looking after long-running systems: continously upgrading them. Cloud Foundry had cut 12 releases in the first 4 months of 2015.
One of the reasons I became fascinated by Cloud Foundry... even from when I discovered it 3 years ago, they gave us BOSH! With somewhat of a Rubix Cube-type challenge of "I bet you can't make it work". :) But there was this vague idea that they did care that you should be able to run it yourself.
A year and a half ago I joked I was the only one who had BOSH running Cloud Foundry, and now everyone can do it.
The Cloud Foundry runtime team now includes great changelogs for each release (e.g. v207), which include links back into their own backlog tool Pivotal Tracker.
You cannot be remiss of information; and obviously the source is open. You can learn all of it. I mean, you know, that'd be your full time job keeping track of it.
I included a couple of
git commands that help use determine that changed, what the scope of change might be.
The following command will show the git log for only those commits that affected the
spec files in job templates between btw release tags:
git log v195..v207 -- jobs/*/spec
Similarly, you can use
git diff to see the diffs for those files.
This might give you more clues about any fundamental BOSH manifest changes you might need to make, warnings to give to your users, and changing defaults.
And whilst you're struggling to figure this all out literally none of your users care about what you're going through. They just want it to work. It's a platform; its just supposed to run forever. And we're struggling with the premise that a new version comes out every week or two.
I guess vendor's trying to make living with Cloud Foundry substantially simpler is a good thing! Such as Pivotal Cloud Foundry.
So this is what users care about, the developers. They've got a test app, they're happy about that. This is about as complex as their life gets. Deploying their app twice.
Sidenote: Good time to remind Cloud Foundry users about
cf copy-source - if your application source works in a test app, then you do not need to
cf push to redeploy - you can
cf copy-source from the test app to the production app. Hopefully in future we also have
cf copy-droplet so there is no buildpack staging; which would have the potential of behaving differently or failing.
But this [the test->prod image] is about as complex as many developers will get. And app, test it, and then promote it into production.
But we know its more complex than that [for us].
You'll be saying "I've got to stop upgrading our production Cloud Foundry and hoping it works". You have the bright idea of deploying a second Cloud Foundry.
You might want to put some test work load on it, test that its all working for your use cases. Run the CATS, Cloud Foundry Acceptance Tests.
Now if you're running OpenStack you've probably learned the most valuable lesson about OpenStack, which is ... it doesn't work. So you might want a second [test Cloud Foundry] one of those.
So we now have three Cloud Foundries running with different BOSHes so we need a solution to deploying them in a quality controlled manner.
Fortunately Alex Surachi from Pivotal wrote a tool called Spiff, which is, how could you put it, the "best in class solution" to this problem.
Spiff is a good solution to the problem of curating reusable, sharable templates; but users will acknowledge some challenges. One of those users was the author of Spiff, Alex Surachi.
I spotted Alex in the audience. He was the guy who had placed his hood fulling covering his head, and was now shrinking down below his seat.
I couldn't help myself - I pulled up Alex's own public statement on his stance on Spiff:
We use spiff, and the
spiff merge command, because it is "best in class" and because upstream cf-release includes some tested spiff templates.
Work this year for BOSH will include some new features with might drive away the need for spiff (see Dmitriy's bosh-notes). Though I think there will always be a need for real world deployments to maintain modified templates from upstream core Cloud Foundry runtime/Pivotal Web Services team.
When you are combining templates keep in mind:
- use the right templates for the BOSH release you are deploying (typically by checking out the matching git tag
git checkout v207)
- which upstream templates do you want, what bespoke templates do you need to create?
- merge them in the right order
Yes its kind of good that the templates are in the repo. I will be forever unhappy that doing anything with BOSH means
git clone-ing something. Personally I would like them [spiff templates] shipped off and packaged somewhere else.
A year ago, when working with Swisscom Cloud Labs, we needed to maintain substantial changes to upstream templates for OpenStack deployments across many OpenStack installations. This led Ruben Koster and colleagues at our beloved client Swisscom to create bosh-workspace.
A BOSH workspace starts with a top-level YAML file that describes which BOSH releases, BOSH stemcell, spiff templates, and input properties.
As an example, cf-aws-large.yml is a large deployment of Cloud Foundry to AWS:
What we have [with bosh-workspace] is another YAML file, because we're all good with that. Nothing bad ever happened with YAML.
It's just when it gets to 3000 lines long you start asking the WTF question. You're on your 4th coffee before you get to the end. You think this has got to stop. There's no story line, no arc, no character development; yet its longer than a novel.
So a BOSH workspace deployment file is a smaller manifest that you can read.
In the example above we can see we're using PostgreSQL instead of RDS, and HAproxy instead of an ELB, and...
NFS because you hate yourself. Apparently you're using Google where it only shows results from the 90s and you've learned about NFS for the first time thinking "that's an awesome idea".
We've evolved into having many smaller templates. See previous blog post Composing BOSH manifests with Spiff for more information on using spiff and writing smaller templates.
We also constructed a 6 VM deployment of Cloud Foundry, cf-aws-tiny.yml, rather than the upstream deployment of approximately 20 VMs:
There's a [common, mistaken] idea that Cloud Foundry needs 20 VMs. If I could take a moment to dispel that: It's just software. There's nothing about those bits that preclude them from being on the same machine. There wasn't like a blood feud. That's Ruby and that's Java, and they hate each other.
Example BOSH workspaces
There are some BOSH workspace repositories available:
- cf-boshworkspace - Cloud Foundry
- logsearch-boshworkspace - Logsearch, an ELK distribution
- docker-services-boshworkspace - a Service broker for Cloud Foundry backed by Docker daemon
Beyond BOSH workspace
I sort of fell out of love with bosh-workspace. It got complex, it's a little ikky. As I started asking other people what they were doing about this problem, everyone was just using shell scripts.
bosh-workspace might be an over-solution to the problem of collecting together and spiff templates and running
spiff merge. Its a BOSH CLI plugin, written in and requiring Ruby, and it also modifies the BOSH CLI source itself.
But I still like the idea of all the smaller templates, built up to extend/fix the upstream BOSH release templates.
Some of the other behaviors of bosh-workspace - uploading BOSH releases & stemcells, running
spiff merge, running
bosh deploy became less necessary to me because I had discovered a new platform for orchestrating this behaviour: Concourse
Automation of deployment with Concourse
On the same day of my talk at CF Summit I published a blog post Using Concourse CI to safely deploy into production including a screencast demonstration. This is to be my replacement for bosh-workspace.
With Concourse I can orchestrate and configure new behaviors.
In the screenshot above, the Concourse pipeline does a BOSH deployment three times - on the left it tries to deploy using all the shiny latest inputs - new BOSH releases, new stemcell, new templates. The goal is to protect the "production" deployment on the right. Only if the deployments on the left & center are successful will those inputs be allowed to be applied to production.
Bringing spiff to concourse
One change was that I no longer wanted a bit stub with all the overrides to the upstream templates. Instead I only wanted to pass in the inputs from the pipeline: the BOSH release & stemcell versions for example.
Ever other "input" was converted into another template and git committed. The name of the deployment isn't changable - its permanently committed in a template, for example.
The inputs are the Concourse resources - external immutable, versioned entities - that Concourse polls for newer versions.
If the "try-anything" deployment is successful, then it packages up the input stub and templates as
s3-candidate-assets resource (a versioned tarball stored in s3). This tarball is what might ultimate be applied to production.
Here is the subset of templates plus the stub that make up
s3-candidate-assets tarball in the demo:
I introduced a new CLI tool
makemespiffy to help parameterize a BOSH manifest or stub file or other template, and put the extracted value in a separate template file. [github repo]
Docker docker docker
Concourse uses Docker images to describe resources and the work container for every task.
Concourse uses docker, so now you know its cool.
I have other points, but at least I got to say Docker. It's on every line! Bang, nailed it!
These pipelines are for deploying different BOSH releases, in different ways, to one or more bosh-lites. They are my experiments and some were used in the screenshots and demo vides.
- https://github.com/starkandwayne/redis-bosh-pipeline - simple Redis cluster, to three separate bosh-lites (the demo video above)
- https://github.com/starkandwayne/logstash-docker-pipeline - Logstash Service Broker via Docker
- https://github.com/starkandwayne/cf-pipeline - Cloud Foundry
- https://github.com/starkandwayne/concourse-deployment-pipeline - pipeline to use Concourse to deploy Concourse