May 07, 2019 Air Gapping: A Moat for the 21st Century
What is Air Gapped?
An air gapped environment is one that is not accessible from the internet and cannot access the internet.
Enterprises use air gaps in order to prevent access to their network from nefarious external actors and to prevent their own engineers from being able to install any arbitrary software from the internet without it first going through a formal review process to ensure that it meets the organization’s security standards.
An air gapped environment has its share of challenges:
- A lot of software defaults to using the internet.
- Operational overhead.
- Slows down operations due to bureaucratic delay.
For app and container platforms such as Cloud Foundry or Kubernetes, this type of environment brings a number of implications for both application developers and platform operators. This blog post will focus on the implications for operators.
Application developers must write their apps so that they do not rely on internet resources during either the build process or while running. In Cloud Foundry, developers should use the offline version of buildpacks in order to have their apps cf push successfully.
For operators, things get more complicated.
By default, the install of a platform will rely on a number of internet resources: GitHub, Docker Hub, package repositories, and Pivotal Network in the case of a PCF install. In an air gapped environment, each of these resources must be instead provided by an internal service.
Examples of the internal services that an organization can run in order to replace a given external service:
- An external repository like GitHub/Bitbucket/GitLab can be replaced with an internal “enterprise” instance of the same service.
- Docker Hub (or any external container registry) can be replaced with VMWare’s Harbor.
- Pivotal Network and package repositories can be replaced with an internal S3-compatible blobstore such as MinIO, Dell ECS, or Ceph Object Storage.
The automation or manual processes that are used to install the platform in the default, internet-based scenario must then be modified in order to grab resources from these locations instead of the internet-based locations. (See Figure 2.)
The same goes for upgrades of components of the platform: default automation or manual process will rely on an operator’s ability to access the internet in order to retrieve the new binaries for the upgrade. These new binaries must instead come from an internal service and the automation or manual processes must be updated to reflect this.
Our theoretical organization’s install and upgrade processes are now modified to grab their required resources from internally instead of trying to use the internet. Great! But that’s only half of the problem. How do these internal services get populated with the resources that our install or upgrades will require in the first place? This depends on the organization and what they will allow based on the security standards that they are aiming to meet.
In the ideal situation, the entire environment is still air gapped but the platform operators are allowed to whitelist a VM that can talk to the internet. This VM then acts as the worker VM for automation that downloads the necessary resources from the internet and puts those resources into the appropriate internal service.
This is the ideal scenario for a few reasons:
- Adding a new binary / image / repo to be accessed from within the air gapped environment is a matter of modifying the automation config file to include the desired resource. No humans are in between operators getting what they need in order to do their jobs.
- These automation config files then double as a list of exactly what has been pulled into the environment, as long as:
- The automation config files are properly version controlled.
- SSH access to the whitelisted VM is sufficiently locked-down.
However, in many organizations, the ability to allow a small set of VMs to talk to the internet for the above purpose is completely off the table. This is especially common in military or government environments. In these cases, the required process for moving resources into the air gapped environment can vary. In most cases, these processes are 1.) slow, and 2.) involve people who are not on your team.
Make a Plan
For both of these reasons, it is extremely important for an operations team to plan ahead by having a list of exactly which resources will be required to install and upgrade. Every binary, container, or repo that is not moved into the air gapped environment on the first go will cost your team the time it takes to do the process again, plus it will risk burning your team’s political capital with the teams responsible for moving resources into the environment because they likely do not care whether your team is successful and could see constant “Hey, we forgot this binary, can you move that to server XYZ for us?” requests as unnecessary favors.
When you have a plan and you know the requirements, you can overcome the hurdles of setting up an air gap between your environment and external networks. Yet you can still equip yourself so that developers use the platform transparently as they would before.
This is because as a platform operator you’ve taken the time to make your plan. With the plan to replace the external facing services like a source repository, a container image store, and an object store, you can still do platform automation and upgrades with ease.
Recently we gave a talk at the Cloud Foundry Summit 2019 with a client who has successfully air gapped their environment. Vince White, from Agile Defense, shared the stage with me and we discussed these very things. Watch our talk on YouTube if you’re curious about what we did.
And if you need further help or assistance please feel free to reach out to us.