What process is failing in my BOSH deployment?

The result of running bosh deploy might be unfortunately failing. The output will show which jobs (VMs) did not successfully run by the end of the timeout period.

We can use the BOSH API to discover specifically which job template(s) are failing using trusty curl and jq:

Setup environment variables:

export bosh_target=
export bosh_username=admin
export bosh_password=admin

Create a helpful curl wrapper script boshcurl and place it in your $PATH:

set -e
if [[ -z $bosh_path ]]; then
  echo "USAGE: boshcurl /deployments"
  exit 1
if [[ -z ${bosh_username} || -z ${bosh_password} || -z ${bosh_target} ]]; then
  echo 'Requires ${bosh_username} ${bosh_password} and ${bosh_target}'
  exit 1
curl --location -ks -u ${bosh_username}:${bosh_password} ${bosh_target}${bosh_path}

You can now play with the BOSH API:

boshcurl /info
boshcurl /deployments
boshcurl /tasks

Combine them with jq to prettify or find information.

Find failing deployment

Setup a variable for the deployment name of interest:

export deployment=my-deployment-name

Trigger the task to fetch current VM statuses:

task_id=$(boshcurl /deployments/${deployment}/vms\?format\=full | jq -r .id)

Eventually the results will be available via:

boshcurl /tasks/${task_id}/output\?type\=result | jq .

The output is a series of JSON blobs for each job/VM. Each of these will include a processes section, which includes all the processes created by the monit files in all the job templates collocated on all the jobs.

To find the failing processes:

boshcurl /tasks/${task_id}/output\?type\=result | jq ".processes[] | select (.state != \"running\" and .state != \"unknown\")"

The output will be a series of JSON blobs for each non-working process:

  "name": "cf-containers-broker-route-registrar",
  "state": "failing",
  "uptime": {},
  "mem": {
    "percent": 0
  "cpu": {
    "total": 0

Spread the word

twitter icon facebook icon linkedin icon