Oct 25, 2019 Forging Bare-Metal; Introducing MoltenCore
What would my Cloud look like if I could start fresh and, moreover, if I could pick any technology I wanted? This exact opportunity happened to me during the preparations for the European Cloud Foundry Summit 2019 in The Hague, because Stark & Wayne agreed to sponsor the Hands-On Labs sessions. This meant we would pay for the shared infrastructure, which included a shared Cloud Foundry.
In an effort to save costs, we took the opportunity to re-examine all the different layers which are used to deploy a Cloud Foundry. Starting all the way down from the physical infrastructure, up to way we perform ingress into our system. For this we took an MVP approach to each component in our stack. Using this approach we came up with the following table of functions and technical solutions.
|resource allocation||static placement||BOSH multi-cpi|
|inter-host communication||overlay network||Flannel|
|ingress traffic||network address translation||Docker HostPortBinding|
All we needed now was something to host these technologies. Our ideal solution should require no maintenance, be self updating, and should work on any environment. We found our solution in CoreOS Container Linux; which, out of the box supports docker and Flannel (+ ETCD a dependency of Flannel).
Phase 1: Proof Of Concept
In this phase, we created Terraform automation to create a CoreOS cluster on the Packet.com bare-metal cloud. The Terraform TLS provider was used to generate self-signed certificates for the docker daemons and Ignition was used to configure the daemons with these certificates through Systemd.
We started out using BOSH dynamic networking on top of Flannel; however, we encountered issues with this approach after reboots. This was failing because the container IP's would change, which is not something BOSH templates are built for (they are not re-rendered after reboot). We found that if we prevent the docker daemon from using the Flannel subnet for the default bridge network, we can create a custom docker network with the same subnet which works with BOSH manual networks.
The BOSH cloud and CPI configurations came from a shell script extracting from the Terraform output to obtain the docker daemon endpoints, TLS client certificates and Flannel subnets (which we statically generated in Terraform). Also a helper script was injected to start a shell inside a Docker container from which all interaction with the deployed BUCC could be performed.
We successfully used the proof of concept code to keep a 3 node (32 GB of memory per node) cluster running which hosted the shared Cloud Foundry environment for the European Cloud Foundry Summit 2019 Hands-On Labs sessions. However, some corners were cut in the development of the POC, so we took our initial findings and re-developed the next version from scratch.
Phase 2: Portability
Due to the nature of Ignition configs (they are only applied once during first boot), we had to recreate our testing environment (using Packet.com) many times. The development feedback cycle time was about 10 minutes; also, the lack of a local development environment was problematic. The POC codebase was not architected with supporting different IaaS providers in mind, as a result there was a lot of logic embedded in terraform templates in a non reusable way.
Since we already had an ETCD cluster (for Flannel), we decided to move the TLS certificates for the docker daemons into ETCD, and generate them using a custom binary on node startup. We also moved the management of Systemd unit file changes into this same binary. This approach allowed us to decouple from Ignition, which created a faster feedback cycle (around 5 seconds).
mc binary is written in golang, and is responsible for converting a vanilla CoreOS cluster (with a configured ETCD cluster + Flannel) into a MoltenCore cluster. It is fully cloud agnostic and, as of now, has been tested successfully with coreos-vagrant and Packet.com. Since all docker daemon endpoints end certificates are now available in ETCD we can generate BOSH cloud and cpi configs right after our BUCC is running.
Phase 3: Auto Updates?
As it stands currently, MoltenCore is great for smaller scale Cloud Foundry or Kubernetes clusters; however, I would not yet recommend it for production use. For this, we will need to support clean node reboots (which happen as part of CoreOS during its auto update cycle). We hope to achieve this, at a later time, by taking inspiration from the container-linux-update-operator, which performs this function for k8s.
We also hope to reduce the spin up time of a cluster by packaging BUCC in a Docker image, this would hopefully allow us to move away from `bosh create-env`. Most of the time it takes to spin up BUCC (currently around 10 minutes) is spent on moving around big files (untaring and verifying BOSH releases). Having faster (re)start times for BUCC helps with faster recovery in the case of node updates.
MoltenCore allows running containerized container platforms on bare-metal in a BOSH native way, using a highly available scale-out architecture. The source code of the
mc binary is hosted in the molten-core repo together with the scripts to create a local cluster using vagrant. That being said, the best way to create a MoltenCore cluster is by using one of the platform specific Terraform projects (eg. packet-molten-core).