When you are running out of Cloud Foundry resources (which results in apps not being able to stage/start) you may want to list the available memory/disk on your DEA/runner vms. In this blog post we will be using the same tools as in my previous blog post, but instead of using the Cloud Controller API as our source of truth, we will be using the NATS message bus.

Lets start by installing some dependecies:

sudo gem install nats --pre && sudo apt-get install socat  

We will also be using jq which was covered in more depth here.

To connect to NATS we will need connection details. Which can be extracted from your Cloud Foundry BOSH deployment manifest using the following snippet (snippet uses yaml2json which was introduced here).

MBUS=`cat __path_to_deployment_manifest__ | yaml2json | jq -r '.properties.nats | "nats://\(.user):\(.password)@\(.address):\(.port)"'`  

The message we are interested in is dea.advertise, so lets subscribe to this message using nats-sub:

> nats-sub -s $MBUS dea.advertise
Listening on [dea.advertise]  
[#1] Received on [dea.advertise] : '{"id":"1-bb1caa3c08324dbebb24552b76980a1f","stacks":["lucid64","cflinuxfs2"],"available_memory":47488,"available_disk":296748,"app_id_to_count":{"844caa81-a453-40c0-a511-827329f2c9b4":1,"8c398c22-c25b-403a-b67a-123b344ddff9":1,"6e45aa45-e414-4cf6-80b3-0e89b82bcb31":1},"placement_properties":{"zone":"z1"}}'
[#2] Received on [dea.advertise] : '{"id":"3-4aa6d126ddd44d3d96aba32e5be3085c","stacks":["lucid64","cflinuxfs2"],"available_memory":45056,"available_disk":294340,"app_id_to_count":{"5d8addfa-86ab-4ed2-b3c9-c793a2a0b408":1,"af51c7fa-62dc-46b1-b78b-f89ef30c3191":1,"8cdd2bd2-79f4-4060-ac61-596c8ea794b5":1,"dd8689d3-71fc-4eee-85f7-79769867d387":1,"5fac5461-cb20-465d-b572-b28111092509":1},"placement_properties":{"zone":"z1"}}'

We can use the -r (raw) flag to easily extract the relevant information with jq, in addition to the extra flag we will also wrap nats-sub in socat to overcome stdout buffering issues:

> socat EXEC:"nats-sub -r -s '$MBUS' 'dea.advertise'",pty,ctty STDIO | jq -r '"\(.id) disk:\(.available_disk) memory:\(.available_memory)"'
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488  
0-a4da6e64cadf49c6ba21780676a2201a disk:293136 memory:44032  
2-e9feaabb2d584b029f36a1cd2dfc6cfb disk:297592 memory:47104  

While we are at it lets also add the number of apps per DEA by using jq reduce:

> socat EXEC:"nats-sub -r -s '$MBUS' 'dea.advertise'",pty,ctty STDIO | jq -r '"\(.id) disk:\(.available_disk) memory:\(.available_memory) apps:\(reduce .app_id_to_count[] as $i (0; . + $i))"'
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488 apps:3  
0-a4da6e64cadf49c6ba21780676a2201a disk:293136 memory:44032 apps:6  
2-e9feaabb2d584b029f36a1cd2dfc6cfb disk:297592 memory:47104 apps:2  
3-4aa6d126ddd44d3d96aba32e5be3085c disk:294340 memory:45056 apps:5  
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488 apps:3  

We are not interested in a continues stream of messages instead we want a point in time snapshot of the current state of the world. So lets use timeout (to kill nats-sub) in combination with sort, uniq and column to make a nice list:

> socat EXEC:"timeout 6 nats-sub -r -s '$MBUS' 'dea.advertise'",pty,ctty STDIO | jq -r '"\(.id) disk:\(.available_disk) memory:\(.available_memory) apps:\(reduce .app_id_to_count[] as $i (0; . + $i))"' | sort -n | uniq | column -s" " -t
0-a4da6e64cadf49c6ba21780676a2201a  disk:293136  memory:44032  apps:6  
1-bb1caa3c08324dbebb24552b76980a1f  disk:296748  memory:47488  apps:3  
2-e9feaabb2d584b029f36a1cd2dfc6cfb  disk:297592  memory:47104  apps:2  
3-4aa6d126ddd44d3d96aba32e5be3085c  disk:294340  memory:45056  apps:5  

The timeout should be greater then dea_next.advertise_interval_in_seconds which defaults to 5 seconds.

The final watch optimzed snippet looks like this:

watch "socat EXEC:'timeout 6 nats-sub -r -s \"$MBUS\" \"dea.advertise\"',pty,ctty STDIO | jq -r '\"\(.id) disk:\(.available_disk) memory:\(.available_memory) apps:\(reduce .app_id_to_count[] as \$i (0; . + \$i))\"' | sort -n | uniq | column -s' ' -t"