After three years of using BOSH I'm still surprised that bosh cancel task is relatively ineffective. It doesn't immediately cancel the task; rather it registers the request, and patiently waits for the current task to get into a state where you could safely cancel it. Which is almost never what you want. You know something is wrong and you want to cancel the task NOW.

More importantly perhaps: you want to unlock the current deployment.

BOSH has a lock over each deployment - you can only perform one deployment per deployment name at a time. So if you cannot cancel a task that has locked a deployment, then you cannot perform a subsequent task.

The rest of this article documents how to find and delete a lock.

Where is a BOSH lock?

It depends on how old your BOSH Director is. If you are using a version of BOSH made since 2016 the locks are stored in a Postgres database on the director. Older versions store this information in Redis and directions for clearing locks are further down.

Newer BOSH Directors

First start by stopping all the monit jobs on the director and then just restarting Postgres:

/:~$ sudo -i
[sudo] password for vcap:  (hint: defined in your manifest, credhub or `c1oudc0w`)
/:~# monit stop all
/:~# monit start postgres

The locks are stored in a Postgres database on the BOSH director. To connect to the database from the director run:

/:~# /var/vcap/packages/postgres-9.4/bin/psql -U vcap bosh

To find the lock, query the locks table by running: SELECT * FROM locks;

id   | expired_at                 | name                                | uid  
1722 | 2017-07-13 17:55:14.075042 | lock:deployment:us-east-1-pr-shield | f3218bf0-ec1e-40f0-b843-4ba9b8779954  

You can delete the offending row by DELETE FROM locks WHERE id=1722, substitute your own value for the id. The table is normally empty when there are no tasks running.

Now restart the monit jobs and the lock should be clear

monit start all  

Older BOSH Directors

BOSH a few years ago used Redis for locks. To find the location and password for Redis, look in the Director's configuration:

$ cat /var/vcap/jobs/director/config/director.yml | grep redis -A3
  port: 25255
  password: redis
    level: info

For a single VM BOSH, Redis will be running on the same host on port 25255.

To connected to Redis, add the redis-cli to the $PATH:

export PATH=$PATH:/var/vcap/packages/redis/bin  
redis-cli -p 25255 -a redis  

To find the lock, look up all current locks:

> keys "lock:*"
1) "lock:deployment:my-locked-deployment"  

There it is - the "deployment" lock for deployment "my-locked-deployment".

Delete it with Redis command del:

> del lock:deployment:my-locked-deployment
(integer) 1