当你不能连接到Cloud Foundry 中的日志服务器时怎么办?

当你的应用链接Cloud Foundry (CF) 的日志服务器超时,你该怎么办?

当我往部署在AWS中的CF发布我的应用时,出现了以下错误:

timeout connecting to log server, no log will be shown
Starting app cf-env in org codex / space cf-app-testing as admin...
FAILED
Error restarting application: StagerError

为了获取更详细的错误日志,我运行了CF_TRACE=true cf push,我看到下面的信息一直停在那里,一动不动的。

WEBSOCKET REQUEST: [2016-08-17T19:45:38Z]
GET /apps/e189be2e-770f-4d1c-94e2-d2168f2d292d/stream HTTP/1.1
Host: wss://doppler.system.staging.xiujiaogao.com:4443
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: [HIDDEN]
Origin: http://localhost
Authorization: [PRIVATE DATA HIDDEN]

因为错误发生在向doppler服务器发送请求的时候,我运行bosh vms查看是否所有的doppler服务器都在正常运行。接下来我远程登录到doppler服务器,运行monit summary来查看是否所有作业都在正常运行。

运行 monit summary 的输出如下:

Process 'doppler'                   running
Process 'syslog_drain_binder'       running
Process 'metron_agent'              running
Process 'toolbelt'                  running
System 'system_localhost'           running

一切看起来运行正常,于是我去查看具体的日志文件,在/var/vcap/sys/log/doppler/doppler.stderr.log 文件中, 我看到了以下错误信息.

panic: sync cluster failed
goroutine 1 [running]:
panic(0xb0d3c0, 0xc8201460f0)
        /var/vcap/data/packages/golang1.6/85a489b7c0c2584aa9e0a6dd83666db31c6fc8e8.1-0ebd71019c0365d2608a6ec83f61e3bbee68493c/src/runtime/panic.go:464 +0x3e6
main.NewStoreAdapter(0xc82004bb00, 0x3, 0x4, 0xa, 0x0, 0x0)
        /var/vcap/data/compile/doppler/loggregator/src/doppler/main.go:58 +0x185
main.main()
        /var/vcap/data/compile/doppler/loggregator/src/doppler/main.go:92 +0x4f9

由于某些原因,日志服务器组不能同步。我一个超级同事Geoff推荐我试试下面HM-9000灾难恢复方法,总结步骤如下:

monit stop etcd (on all nodes in etcd cluster)
rm -rf /var/vcap/store/etcd/* (on all nodes in etcd cluster)
monit start etcd (one-by-one on each node in etcd cluster)

很遗憾,这个办法没能解决我的问题。但是我觉得依然值得分享,因为这个办法很有可能解决其它一些类似的日志问题。

既然看起来一切都运转良好,HM9000重新设置没能解决问题,我想到去查看我的Security Group设置和路由表。我登陆Amazon的AWS Console,这两项都在VPC服务下面左边的一栏中。我发现与日志服务器相关的Security Group设置中被 web socket 用来通信的端口4443被禁止了。当我允许通过流量进入端口4443后,我成功发布了我的应用!

To read the English version, please go to What You Should Do When Your App Can Not Connect to Log Servers in CF.

Spread the word

twitter icon facebook icon linkedin icon