Collocated BOSH errands: Run one-off tasks inside your instances

The BOSH v263 release added an exciting new feature – the ability to run one-off tasks (called errands) inside existing instances.

A BOSH deployment is the top-level first-class citizen of running things with BOSH. Typically a BOSH deployment will be one or more long-running instances on your target cloud infrastructure. For example, you could use BOSH to deploy a 5-node cluster of Zookeeper to Amazon AWS or much much larger systems such as Cloud Foundry or Kubernetes.

A few years ago BOSH added the ability to run one-off tasks, called errands. These errands are run inside brand new, temporary instances. The results – stdout/stderr – are returned from the errand instance, and the instance is destroyed.

What you could not do with errands was run one-off tasks on the same instances as your running job templates/processes. You couldn’t perform local cleanups or health checking or other introspection tasks.

Fortunately this feature is now open to you once you upgrade to BOSH v263+ and Linux 3445+ stemcells!

Dmitriy’s https://github.com/cppforlife/zookeeper-release has been upgraded to add a status collocated errand. Deploying and testing this errand is very simple (see BOSH 2/AWS/Zookeeper blog post for lengthy getting started tutorial for BOSH).

git clone https://github.com/cppforlife/zookeeper-release
cd zookeeper-release
bosh -d zookeeper deploy manifests/zookeeper.yml

Once the 5-node cluster is running, you can run the status errand:

bosh run-errand status

The output will initially show the errand running on each instance, and then it will aggregate the output from each instance:

Task 2200 | 02:15:23 | Preparing deployment: Preparing deployment (00:00:00)
Task 2200 | 02:15:23 | Running errand: zookeeper/5d746534-e40e-445d-9e7d-7b2c4608e322 (1)
Task 2200 | 02:15:23 | Running errand: zookeeper/6e8b7728-88fd-4da1-8191-66546a80a8f6 (4)
...
Task 2200 | 02:15:24 | Fetching logs for zookeeper/6e8b7728-88fd-4da1-8191-66546a80a8f6 (4): Finding and packing log files
Task 2200 | 02:15:24 | Fetching logs for zookeeper/e06dcbb1-5fcd-4bf7-874b-0bdc07c31620 (0): Finding and packing log files
...
Instance   zookeeper/5d746534-e40e-445d-9e7d-7b2c4608e322
Exit Code  0
Stdout     Mode: follower
Stderr     ZooKeeper JMX enabled by default
           Using config: /var/vcap/jobs/zookeeper/config/zoo.cfg
Instance   zookeeper/b5923bcf-3106-4b7f-8eb4-212b5877ef81
Exit Code  0
Stdout     Mode: leader
Stderr     ZooKeeper JMX enabled by default
           Using config: /var/vcap/jobs/zookeeper/config/zoo.cfg
...

Implementation

Collocated errands are implemented exactly the same as traditional errands within your BOSH release. That is, the only requirement is that a job template has a bin/run script that exits 0 if successful.

To add a collocated errand into your deployment manifest, place it within the instance group like any other errand. For the zookepeer.yml example the status job template is added:

instance_groups:
- name: zookeeper
  instances: 5
  jobs:
  - name: zookeeper
    release: zookeeper
    properties: {}
  - name: status
    release: zookeeper
    properties: {}
  ...

Debugging errands

When standalone errands fail it is difficult to debug them as the temporary instance was deleted before you learn that the errand has failed. You would need to re-run the errand with the --keep-alive flag so that the instance is not deleted, thus allowing you to SSH in and isolate the issue.

bosh run-errand sanity-test --keep-alive
bosh ssh sanity-test

With collocated errands, debugging them has never been easier. The errand script is now a short-lived process on a long-running instance, so you can SSH into the instance before/during/after the errand is run, observe it, watch the logs, etc.

bosh ssh zookeeper/0
bosh run-errand status

Errors you might see

For your benefit, I hit some bumps and thought I’d document them here so you don’t have to hit the same bumps.

Errand doesn’t exist

If you get this error, then you have not yet upgraded your BOSH to v263+. You might feel silly initially, and then realise that I didn’t just guess what this error looks like. I too forgot to check that my BOSH was upgraded first.

Task 313 | 00:39:33 | Preparing deployment: Preparing deployment (00:00:01)
                    L Error: Errand 'status' doesn't exist
Task 313 | 00:39:34 | Error: Errand 'status' doesn't exist

Older stemcell

Your deployment will need to be running on Linux 3445+ stemcells for collocated errands to work. If you have not yet upgraded your stemcells, you might see this error:

Task 2182 | 01:41:29 | Preparing deployment: Preparing deployment (00:00:00)
Task 2182 | 01:41:30 | Error: Multiple jobs are configured on an older stemcell, and "status" is not the first job

Spread the word

twitter icon facebook icon linkedin icon