Sometimes when helping our clients get their Cloud Foundry installation in line with what we consider best practices we determine the best strategy to achieve this is to create a brand new deployment and migrate all existing workload.

In this blog post I will describe the steps we took to perform a migration from an old v245 CF installation with DEA and Diego cells, using PG as a database to a new v247 installation using MySQL db with only diego cells as runners.

EDIT:
Though we love PostgreSQL this migration was necessary due to customer requirements. In particular the HA properties of the MySQL BOSH release are well tested and widely used in many prod Cloud Foundry installations. Adapting a PostgreSQL release to support the required HA properties would have taken more time than we had to complete the project.

This kind of migration has a number of gotchas and it took us many days of trial and error at each step of the way before we had a plan that consistently gave us the desired result. That being said this is just one solution that we found to work in our exact situation given the exact constraints we faced. I'm sure there are more elegant solutions and steps that will have to be done differently in other situations.

Preparation

This migration took place on AWS. The new installation was deployed into the same account and region as the old one. A vpc_peering_connection was created via terraform and connected to the routing tables of the old and new VPC so that network connectivity was present.

1) CF Properties

When deploying the new CF we configured the UAA to pre-create a user with identical credentials as the existing admin user.
This made sense so that smoke-tests (which were part of the genesis managed CI pipeline) could be run in the new environment regardless of where the DNS record for the api was pointing.
Also the setting under properties.cc.db_encryption_key had to be identical for the cloud_controller to be able to read the database after migration.

2) Migration tools

To do the actual migration from Pg -> MySQL we used a proprietary tool (50$) that can be found here. Out of all the methods we tried this was the most reliable! It comes as an exe and will need to be installed on a Windows box that has network access to both the Pg and MySQL instance. We deployed the box into the AWS vpc of the old installation and made sure the security_groups allowed access to pg and the leading mysql_proxy. We used Remote Desktop to access the box. We also installed MySQL Workbench to truncate the databases but this could also be done from the command line.

3) Smoke Tests

Before beginning the migration we pointed the DNS entry to the new environment just to run the cf smoke tests 1 time. This caused ~15minutes of downtime. Once we had validated the new environment we pointed the DNS back.

Migration

Here is how we actually migrated the data.

1) Stop jobs

Use BOSH to stop all jobs associated with the databases prior to migration.
These are: api_z1, api_z2, uaa_z1 and uaa_z2 jobs

2) Backup

We took a shield-backup from the new environment (uaadb + ccdb). If things go wrong when migrating its always nice to be able to jump back to a known good point.

3) Truncating tables

Before migrating the data be sure that all the tables (of the new environment :-p) are completely empty!
With 1 exception:
IMPORTANT!!! Be sure not to truncate or alter the schema_migrations table in either database. This table should stay untouched by the migration of the rest of the data.

For the uaadb truncation could be achieved with the TRUNCATE TABLE sql for each table.
For ccdb this didn't work as nicely because some tables have foreign key constraints and cannot be truncated. For the tables of the ccdb where TRUNCATE TABLE doesn't work the individual rows need to be removed via DELETE FROM sql.

4) Backup

Take another backup when all tables (except schema_migrations) are empty. This is also a useful point to jump back to when things don't work.

5) Data migration

Start the exe of the installed migration tool this will start a walkthrough that guides through the migration process. Enter the details of the connection, select the database name (we started with uaadb) and schema (public). At one point you can select the migration strategy. There are options regarding wether tables should be dropped and the schema should be migrated or not. Out of all options the only one that worked consistently was choosing merge. This leaves the existing tables as they are and merely transfers the rows. The last step before the data is actually transferred is selecting which tables to migrate. Select them all except schema_migrations.

6) One more backup

Once the migration is done, you may as well back this up as well.

7) Migrate blobstores

As a last step before starting the jobs back up the blobs stored in the cf blobstore must be migrated. Since we were using S3 as a blobstore for both the old and new installation we were able to perform this step using the aws s3 sync ... command.

Running the new CF v247

At this point all the data to start all the existing apps on the new CF had been migrated. Moving forward from here is committing to running the apps on the new platform. Trying these steps out multiple times on a staging environment is highly recommended.

1) Restart the jobs

With bosh start all all the previously stopped jobs will start back up.
This should cause the apps to be scheduled on the diego runtime.

2) Switch DNS

Once the apps start up pointing the DNS record to the new API will make them accessible via the new platform.
At this point rerunning the smoke-tests will ensure that things are working as expected.

3) Services

Since the scope of the migration only applied to the core platform all the existing backing services needed to be made accessible from the new deployment so that apps utilizing them could continue to function properly. For this most of them needed to be redeployed with knowledge of the new nats endpoint and credentials so that they could advertise their endpoint to CF.

4) Enabling diego

Since we still had some old style runners in the old deployment not all the apps would start up automatically. Using the Diego-Enabler cf-cli plugin it was easy to identify and enabling all apps to run on diego which caused them to automatically be scheduled and work

Celebrate

Once all that has gone well a round of congratulations is in order!

If this blogpost was useful to you or you have further questions let me know in the comments.