Keeping an Eye on Cloud Foundry

This is the third post in the series about keeping an eye on Cloudfoundry. Click here for the previous post or here to go to the start of the series.

What have we done?

At Stark & Wayne we help our clients integrate cutting edge technologies in to their stack. Each of us is also tasked with extending these projects for the good of the community. Often we get to do both. Recently the good folks at Swisscom needed to aggregate information about Cloud Foundry in to their Consul Cluster. You can get more information on Consul here

In this case Consul is being used as part of a larger health monitoring system. So our primary use case is to sync up the BOSH Monitor heartbeats for each deployed component with TTL checks in Consul. Along the way we have also added the ability to forward all events on the BOSH NATS bus to your Consul cluster.

How does it work?

The Consul plugin works by forwarding NATS heartbeat events and alerts to a Consul server or agent. The NATS messages can be forwarded as TTL checks and events. Heartbeat messages will be forwarded as TTL checks, each time a heartbeat occurs it will update the TTL check with it's status. If Consul does not recieve a success message within the TTL threshold it will put that component in to a failing status.

When a non heartbeat event or alert occurs it can also be forwareded to Consul as an Event. In our case we are using this information for event correlation to provide an appropriate automated response.

How do I use it?

Using a BOSH Monitor plugin is covered in part one of this series. The option to enable this feature is consul_event_forwarder_enabled.

    consul_event_forwarder_enabled: true
    consul_event_forwarder:
      host: hostname.of.consul
      events: true
      ttl: 600s
      namespace: ns/
      heartbeats_as_alerts: true

The options available to the plugin are as follows:

  • host: The hostname of your consul cluster
  • namespace: A namespace to identify a single Cloud Foundry
  • port: Defaults to 8500
  • protocal: Defaults to HTTP
  • params: Can be used to pass access token "token=MYACCESSTOKEN"
  • ttl: TTL Checks will be used if a TTL period is set here. Example "120s".
  • events: If set to true heartbeats will be forwarded as events to consul
  • ttl_note: A note that will be passed back to consul with a TTL check
  • heartbeats_as _alerts: If set to true all heartbeats will also be forwarded as event

BOSH Talks too much!

It turns out that when we added the events forwarding feature, many of the events were too large for Consul to handle which resulted in an error on the Consul logs. So we had to put the NATS messages on a diet. When heartbeats are sent as alerts the format has been made more concise to come in under the event payload bytesize limits that consul enforces

{
  :agent  => agent_id,
  :name   => "job_name / index",
  :state  => job_state,
  :data   => {
      :cpu => [sys, user, wait]
      :dsk => {
        :eph => [inode_percent, percent],
        :sys =>[inode_percent, percent]
      }
      :ld  => load,
      :mem => [kb, percent],
      :swp => [kb, percent]
  }
}

Whatever! Show me the code already!

This plugin is availabe as of BOSH Release stable-2980 and should be available in Microbosh instances at that version or later.