theCodingInterface

Migrating Large Heroku Postgres Instances to AWS Aurora without Downtime

By Adam McQuistan in PostgreSQL 11/12/2020 Comment

Introduction

In this article I discuss a general process I used recently to migrate a large multi-terabyte Heroku Postgres Database from the Heroku Platform to Amazon Aurora Postgres on a live Heroku based application architecture with near zero downtime and builtin failovers during the process. Not only did this migration save significant costs associated with running a large managed Postgres instance it also resulted in increased scalability and flexibility of parameter turing and other management abilities afforded by AWS RDS.

Heroku Architecture and Constraints
The RandoNumba Demo App
Mimicking the Heroku Postgres Service
Restore and Promote EC2 Postgres Log Shipped Replica
Create EC2 Postgres Log Streamed Replica
Switch App Dyno to Promoted EC2 Postgres
Initiate Postgres Logical Replication to Aurora Postgres
Switch App Dyno to Aurora Postgres

Heroku Architecture and Constraints

The Heroku Postgres Migration to AWS Aurora Postgres Architecture and Process Flow diagram above shows, at a high level, the Heroku Postgres Data Layer Architecture for a typical Heroku Premium Level Service with High Availability plus a Read Replica for load balancing Read heavy apps. Then on the right is the migration target goal within the AWS platform boundary annotated with the sequence of steps used to migrate the data to the Aurora Postgres instance.

Before I get into discussing the constraints that I've experienced I would like to put forth an important disclaimer. The Heroku platform is a phenominally innovative service that has paved the way for developing countless extremely useful and profitable apps and services by lowering the barrier to entry for small development teams. Many, if not most, apps will not hit the constraints that I point out below alleviating the need for a migration of their data layer off the platform.

Heroku can be costly, perhaps for good reasons, because they alleviate much of the sysops / devops investment and dev time costs of maintaining and scaling services like Postgres
Heroku Postgres locks down the majority of the Postgres parameters that are often necessary to tune for large enterprise level, high throughput, Postgres usage
Heroku Postgres monitoring falls well short of many of the other options, particularly AWS RDS, which becomes very important for large enterprise grade applications
Heroku Postgres does not allow Postgres superuser or Replication User roles so migration options become limited
Postgres pg_dump / pg_restore is not a viable option for large databases of a terabyte and up because of the amount of time it requires to run and thus inherently implies down time or data loss if used as an option for failover or migration which isn't possible for most applications

The RandoNumba Demo App

To faciltate this discussion I've provided a toy app that is deployable to Heroku and built using the Django web framework which simply generates random numbers and scrapes quotes from the web.

1) Clone dj-randonumba-heroku app from my GitHub repo

git clone https://github.com/amcquistan/dj-randonumba-heroku.git
cd dj-randonumba-heroku

2) Create a Heroku App

heroku create

giving the following output but, note you're output will give a different app name and url

Creating app... done, ⬢ intense-headland-79519
https://intense-headland-79519.herokuapp.com/ | https://git.heroku.com/intense-headland-79519.git

Be sure to update `randonumba.settings.ALLOWED_HOSTS` to include the host that Heroku provides.

3) Push the App Code to Heroku

git push heroku master

4) Attach a free tier Heroku Postgres Add on for Demo Purposes

heroku addons:create heroku-postgresql:hobby-dev -a intense-headland-79519

Use psql to create the hstore Postgres extension this demo app uses

heroku pg:psql -a intense-headland-79519
create extension hstore;
\q

then run migrations

heroku run python manage.py migrate -a intense-headland-79519

5) Register and Play with the App

Open the app in your default browser with the following (replace -a intense-headland-79519 with your app name).

heroku open -a intense-headland-79519

Then register the app and generate some randomness

Mimicking the Heroku Postgres Service

In the Real World for this process to work you need to request for the Heroku Data Support team to establish continuous WAL log shipping to an AWS S3 bucket along with a base physical backup using the WAL-E Python based library. Rather than bothering Heroku's Data Support team for this demo and to allow readers to fully reproduce the demo I will simply mimick this step with my own AWS EC2 instance running Postgres 11 and shipping a continuous archive to my own AWS S3 bucket.

Infrastructure Specs and Services

Amazon Linux 2 AMI
Security Group Allowing port 5432 and SSH access
Separate EBS Volume for Installing the Postgres DB Cluster
S3 Bucket for Pushing Base Physical Backup and WAL
IAM User with Programmatic Access to S3

Procedure for Mimicking Heroku Postgres

1) Update VPS (Virtual Private Server) and Install Dependencies

sudo yum update -y
sudo amazon-linux-extras install epel -y
sudo amazon-linux-extras install postgresql11 -y
sudo yum install postgresql-server postgresql-contrib python3 lzop pv -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
sudo python3 get-pip.py
sudo python3 -m pip install envdir wal-e[aws]

2) Mount Volume and Create File System for Postgres Cluster

Use lsblk to identify the EBS volume to install Postgres Cluster on

$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk
└─xvda1 202:1    0   8G  0 part /
xvdb    202:16   0  16G  0 disk

Format volume as a XFS filesystem.

sudo mkfs -t xfs /dev/xvdb

Create a mount point for the volume and mount it to the new directory.

sudo mkdir /database
sudo mount /dev/xvdb /database

Find the device's block id and make the mount permanent in /etc/fstab

sudo blkid # will give you the block id

example entry in /etc/fstab

UUID=19ee2212-7fa0-4c9a-bcbf-cd7019d50fd6     /database   xfs    defaults,nofail   0   2

Test the mount (no errors is the successful outcome).

sudo mount -a

Update permissions of the /database directory for postgres.

sudo chown -R postgres:postgres /database
sudo chmod -R 700 /database

3) Create envdir Directory for AWS Creds used for WAL-E

sudo mkdir -p /etc/wal-e/env
sudo chown -R ec2-user:ec2-user /etc/wal-e
echo "INSERT-VALUE-HERE" > /etc/wal-e/env/AWS_ACCESS_KEY_ID
echo "REGION-HERE" > /etc/wal-e/env/AWS_REGION
echo "INSERT-VALUE-HERE" > /etc/wal-e/env/AWS_SECRET_ACCESS_KEY
echo "S3-BUCKET-FOLDER-URL-HERE" > /etc/wal-e/env/WALE_S3_PREFIX
sudo chown -R postgres:postgres /etc/wal-e
sudo chmod -R 755 /etc/wal-e/env

4) Initialize Postgres Cluster

Run as postgres user.

pg_ctl init -D /database

5) Modify Postgres Configs

First modify /database/postgresql.conf with the following.

listen_addresses = '*' or your apps specific IP
wal_level = replica
archive_mode = on
archive_command = 'envdir /etc/wal-e/env wal-e wal-push %p'
archive_timeout = 60

Update /database/pg_hba.conf for auth (substiture 0.0.0.0/0 with your APPs IP as necessary).

host    pgdb            pguser          0.0.0.0/0               md5

6) Start and Enable Postgres Service

First create a systemd service file for managing the postgresql service in /etc/systemd/system/postgresql.service

.include /lib/systemd/system/postgresql.service

[Service]
Environment=PGDATA=/database

Reload systemd, start and enable postgresql service

sudo systemctl daemon-reload
sudo systemctl start postgresql
sudo systemctl status postgresql
sudo systemctl enable postgresql

7) Create App Postgres User and Database

createuser -e -l --pwprompt pguser
createdb -e --owner=pguser pgdb

Make hstore extension

psql -d pgdb
create extension hstore;
\q

8) Detach Heroku Postgres Addon and Switch Connection String to EC2 Instance

Note that this is ran locally not on the Amazon EC2 VPS

List addons to get name of Heroku Postgres

heroku addons

Detach the addon for Heroku Postgres

heroku addons:detach name-of-addon -a name-of-heroku-app

Replace the DATABASE_URL config variable to point to the newly spun up EC2 Postgres instance mimicking Heroku Postgres

heroku config:set DATABASE_URL=postgres://pguser:develop3r@ec2-ip-address:5432/pgdb -a name-of-heroku-app
heroku ps:restart -a name-of-heroku-app

Run migrations.

heroku run python manage.py migrate -a name-of-heroku-app

At this point you'll want to register a new user and generate some more test data to migrate. You could also use pg_dump / pg_restore to transfer any existing data from Heroku Postgres to this new EC2 Postgres instance being used to mimic Heroku Postgres

9) Push Base Backup to S3 Using WAL-E

Note that tests next commands are to be ran on the Amazon Linux VPS.

As a personal preference I like to wrap potentially long running commands in shell scripts so I can explicitly echo out when things begin and end then to protect against SSH connections from timing out during potentially long running processes I use nohup with backgrounding.

For this I create the following script

mkdir /var/lib/pgsql/scripts && cd /var/lib/pgsql/scripts
vi wal-e-push-backup.sh

with contents

#!/bin/bash

echo "starting wal-e backup-push"

envdir /etc/wal-e/env wal-e backup-push /database

echo "wal-e backup-push complete"

Run it.

nohup ./wal-e-push-backup.sh &

After the push finishes I should be able to verify that the backup has been pushed with the following command.

envdir /etc/wal-e/env wal-e backup-list

At this point I have an EC2 Instance with Postgres installed mimicking Heroku Postgres and continuously shipping backups and WAL to S3.

Restore and Promote EC2 Postgres Log Shipped Replica

In the real world this is where you will start, that is by creating your log shipped replica loaded from data in S3 in the form of a physical base backup and WAL files. The Heroku Data Support team will likely provide you with S3 creds for WAL-E in the form of Heroku config variables which you can view running the following.

heroku config -a app-name

For this demo I will be using the following specs for a EC2 Instance. You will need to tailor this to your needs and budget. My suggestion is to use a EC2 instance size ideally larger than your current Heroku Postgres instance because it will not only need to handle the full load of your app but also undergo streaming and logical replication at the same time.

Infrastructure Specs and Services

Amazon Linux 2 AMI
Security Group Allowing port 5432 and SSH access
Separate EBS Volume for Installing the Postgres DB Cluster

Procedure for Establishing the Postgres Log Shipped Replica on an EC2 Instance

1) Update VPS and Install Dependencies

sudo yum update -y
sudo amazon-linux-extras install epel -y
sudo amazon-linux-extras install postgresql11 -y
sudo yum install postgresql-server postgresql-contrib python3 lzop -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
sudo python3 get-pip.py
sudo python3 -m pip install envdir wal-e[aws]

2) Mount Volume and Create File System for Postgres Cluster

Use lsblk to identify the EBS volume to install Postgres Cluster on

$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk
└─xvda1 202:1    0   8G  0 part /
xvdb    202:16   0  16G  0 disk

Format volume as a XFS filesystem.

sudo mkfs -t xfs /dev/xvdb

Create a mount point for the volume and mount it to the new directory.

sudo mkdir /database
sudo mount /dev/xvdb /database

Find the device's block id and make the mount permanent in /etc/fstab

sudo blkid # will give you the block id

example entry in /etc/fstab

UUID=19ee2212-7fa0-4c9a-bcbf-cd7019d50fd6     /database   xfs    defaults,nofail   0   2

Test the mount (no errors is the successful outcome).

sudo mount -a

Update permissions of the /database directory for postgres.

sudo chown -R postgres:postgres /database
sudo chmod -R 700 /database

3) Create envdir Directory for AWS Creds used for WAL-E

Use the AWS S3 credentials provided by Heroku to create environment variables in a directory to be used by WAL-E and envdir libraries for pulling down the backup and WAL files.

sudo mkdir -p /etc/wal-e/env
sudo chown -R ec2-user:ec2-user /etc/wal-e
echo "INSERT-VALUE-HERE" > /etc/wal-e/env/AWS_ACCESS_KEY_ID
echo "REGION-HERE" > /etc/wal-e/env/AWS_REGION
echo "INSERT-VALUE-HERE" > /etc/wal-e/env/AWS_SECRET_ACCESS_KEY
echo "S3-BUCKET-FOLDER-URL-HERE" > /etc/wal-e/env/WALE_S3_PREFIX
sudo chown -R postgres:postgres /etc/wal-e
sudo chmod -R 755 /etc/wal-e/env

Once the above environment variable files have been created it is helpful to test WAL-E by listing the available backups with the following command.

envdir /etc/wal-e/env wal-e backup-list

4) Pull the Physical Base Backup from S3 using WAL-E

I use the following bash script named wal-e-fetch-backup.sh to wrap the WAL-E call to pull the base backup from S3 because it enables me to track how long it takes and utilize nohup and backgrounding to guard against an SSH connection timeout.

#!/bin/bash

echo "starting wal-e backup-fetch"

started=`date +%s`

envdir /etc/wal-e/env wal-e backup-fetch --blind-restore /database LATEST

ended=`date +%s`

duration=$((ended - started))

echo "wal-e backup-fetch completed after $duration seconds"

Run it like so.

nohup ./wal-e-fetch-backup.sh &

5) Update postgresql.conf Configs

There are a few config settings that need updated in order to faciliate this replica to be used as a source for Streaming Replication to a failover EC2 Postgres instance as well as be a source for Logical Replication to Aurora Postgres.

listen_address = '*'

hot_standby = on

data_directory = '/database'

wal_level = logical

max_wal_senders = 10
max_replication_slots = 10
wal_keep_segments=1000
wal_sender_timeout=60

6) Update pg_hba.conf Configs

In order to allow the app user, pguser, to connect to the EC2 Postgres Log Shipped Replica as well as the postgres user to connect from the Streamed Replica I must enable md5 password authentication for pguser and trusted auth for the postgres user from the EC2 Streamed Replica's IP address. The trusted auth for the postgres user from the failover Streamed Replica is necessary because Heroku will not have a replication user with a password you can connect with and ideally you want the Streamed Replica available immediately upon promoting the Log Shipped Replica in the event of failure.

Additionally, I also add another password based auth entry for the Aurora rds_replication user that will be used to perform Logical Replication.

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    pgdb            pguser          0.0.0.0/0               md5
host    all             rds_replication 0.0.0.0/0               md5
host    all             postgres        IP-ADDRESS-HERE/32      trust
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
host    replication     postgres        IP-ADDRESS-HERE/32      trust
host    replication     rds_replication 0.0.0.0/0               md5
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

7) Create a recovery.conf File to Direct Restoring the Replica

The recovery.conf file specifies that the WAL-E program should be used to pull WAL files from S3 and continuously restore them to the latest recovery timeline.

restore_command = 'envdir /etc/wal-e/env wal-e wal-fetch %f %p'
standby_mode = on
recovery_target_timeline = 'latest'

8) Generate a Systemd postgresql Service File

To manage the postgresql database service I use a systemd service file at /etc/systemd/system/postgresql.service with a PGDATA environment variable indicating the database cluster to be managed is in the /database directory.

.include /lib/systemd/system/postgresql.service

[Service]
Environment=PGDATA=/database

Lastly, reload the systemd daemon, start and enable the postgresql service.

sudo systemctl daemon-reload
sudo systemctl start postgresql
sudo systemctl status postgresql
sudo systemctl enable postgresql

After starting the postgresql service you will be able to see WAL files being pulled in and restored by watching the Postgresql logs.

Create EC2 Postgres Log Streamed Replica

As mentioned previously, I felt it would be a good idea to have an immediately available WAL Streamed Replica as a Hot Standby to failover in the event the promoted Log Shipped Replica fails. This gaurentees the application remains viable during the migration. Of course, should that happen the Logical Replication to Aurora would need restarted using the WAL Streamed Replica as a source.

The infrastructure for this EC2 Instance should be the same as that used for the EC2 Log Streamed Replica.

Infrastructure Specs and Services

Amazon Linux 2 AMI
Security Group Allowing port 5432 and SSH access
Separate EBS Volume for Installing the Postgres DB Cluster

1) Update VPS and Install Dependencies

sudo yum update -y
sudo amazon-linux-extras install epel -y
sudo amazon-linux-extras install postgresql11 -y
sudo yum install postgresql-server postgresql-contrib python3 lzop pv -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
sudo python3 get-pip.py
sudo python3 -m pip install envdir wal-e[aws]

2) Mount Volume and Create File System for Postgres Cluster

Use lsblk to identify the EBS volume to install Postgres Cluster on

$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk
└─xvda1 202:1    0   8G  0 part /
xvdb    202:16   0  16G  0 disk

Format volume as a XFS filesystem.

sudo mkfs -t xfs /dev/xvdb

Create a mount point for the volume and mount it to the new directory.

sudo mkdir /database
sudo mount /dev/xvdb /database

Find the device's block id and make the mount permanent in /etc/fstab

sudo blkid # will give you the block id

example entry in /etc/fstab

UUID=19ee2212-7fa0-4c9a-bcbf-cd7019d50fd6     /database   xfs    defaults,nofail   0   2

Test the mount (no errors is the successful outcome).

sudo mount -a

Update permissions of the /database directory for postgres.

sudo chown -R postgres:postgres /database
sudo chmod -R 700 /database

3) Use pg_basebackup to Pull Physical Base Backup from Log Shipped Replica

In order to replicate the database cluster for this Streaming Replica I use the pg_basebackup utility that comes with the install of Postgres 11 by directly connecting to the Log Streamed Replica via the postgres superuser. Similar to other commands, I wrap this call to pg_basebackup in a shell script named physical_backup.sh shown below.

#!/bin/bash

echo "starting physical backup"

started=`date +%s`

pg_basebackup -h IP-ADDRESS-HERE -D /database --progress --verbose

ended=`date +%s`

duration=$((ended - started))

echo "physical backup completed after $duration seconds"

Then run it with nohup as a background process to guard against an SSH connection timeout since this may potentially execute for an extended period of time.

nohup ./physical_backup.sh &

4) Update postgresql.conf

Can either reuse the restored postgresql.conf file that got pulled over from the Log Shipped Replica in the last step for pull the default one by just initializing the default Postgresql directory.

listen_address = '*'

hot_standby = on

data_directory = '/database'

wal_level = replica

5) Update pg_hba.conf

Again, its important to at least make sure that the app user, pguser in this demo, can connect via password auth in case you have to fail over to this instance.

host    pgdb            pguser          0.0.0.0/0               md5

6) Generate a recovery.conf file to Dictate Streaming Recovery

This recovery.conf file is different from the one used in the Log Shipped Replica because it tells Postgres to directly connect to the Log Shipped Replica to suck in WAL transactions opposed to reading them from a shared storage device such as S3 or similar.

standby_mode = on
primary_conninfo = 'host=IP-ADDRESS-HERE port=5432 user=postgres'
recovery_target_timeline = 'latest'

7) Generate a Systemd postgresql Service File

Just like the Log Shipped Replica I again use an identical systemd postgresql service file to manage the service.

.include /lib/systemd/system/postgresql.service

[Service]
Environment=PGDATA=/database

Reload the systemd daemon, start and enable the postgresql service.

sudo systemctl daemon-reload
sudo systemctl start postgresql
sudo systemctl status postgresql
sudo systemctl enable postgresql

Remember, you can check the Postgresql logs to verify WAL is being retrieved from the Log Shipped Replica.

Switch App Dyno to Promoted EC2 Postgres

Alright, at this stage of the journey the EC2 Postgres Log Shipped Read Only Replica is promoted to Master allowing it to start receiving writes as well as reads and the Heroku app's database connection is switched to the newly promoted EC2 Postgres. This is the first, or intermediate, database switch necessary to get the Heroku app off Heroku Postgres. Switching the primary database to the EC2 Postgres instance is necessary so that the EC2 Postgres instance can be used for logical replication to Aurora Postgres via Logical Replication and the Publisher / Subscriber paradigm that comes along with it. Unfortunately, a publication cannot be created on a Read Only (aka Hot Standby) Replica so it must be promoted to a Primary database and capture the new writes from the Heroku app and propagate them using Logical Replication to Aurora Postgres.

Procedure for Promoting the EC2 Postgres Log Shipped Replica and Switching Heroku's App DB

1) Put Heroku App into Maintenance Mode

* this is ran locally

heroku maintenance:on -a app-name

2) Promote the EC2 Postgres Log Shipped Replica

* this is ran on the EC2 VPS

 pg_ctl promote -D /database

3) Flip the Database URL on the App via Environment Variable

To repoint the App to the newly promoted database I used a completely separate environment variable named `AURORA_MASTER_DATABASE_URL` and in the code where the database connection is established I simply check for the existance of this variable and selectively connect to it over the standard Heroku DATABASE_URL variable.

For example, in my demo app for this article, which again is writting in Django, the looks like this.

DATABASES = {}

DATABASES['default'] =  dj_database_url.config(env='AURORA_MASTER_DATABASE_URL') or dj_database_url.config()

The specific commands for setting the Heroku environment variable is shown below (again this is ran locally). You'll also notice that I do an explicit restart of the Dyno's after setting the environment variable which ensures that every dyno restarts and gets the environment variable update.

heroku config:set AURORA_MASTER_DATABASE_URL=postgres://pguser:develop3r@IP-ADDRESS-HERE:5432/pgdb -a app-name
heroku ps:restart -a app-name

4) Turn Off Maintenance Mode

heroku maintenance:off -a app-name

Ok at this point the Heroku app is no longer utilizing the Heroku Postgres Database service and is fully on the AWS Platform but, the end goal is to be back on a Managed Database Service which is Aurora Postgres for this example. Of cource, if your end goal is simply to migrate to an EC2 Instance running Postgres then you could stop here.

Initiate Postgres Logical Replication to Aurora Postgres

Ok at this point Logical Replication needs to be setup to utilize Publications (aka Replication Slots) on the newly Promoted EC2 Postgres instance along with Subscriptions on the Aurora Postgres RDS Service.

Procedure for Initiating Logical Replication

1) On EC2 Postgres Create a Replication User

A user, which already exists on the Aurora instance, named rds_replication needs to be created on the EC2 Postgres instance that is now serving as the Master which will be used to connect to via the Aurora instance to execute Logical Replication via the pub/sub replication slot that will be created shortly.

psql -d pgdb
CREATE ROLE rds_replication REPLICATION LOGIN SUPERUSER PASSWORD 'develop3r';

2) Create Replication Slot (aka Publication) on EC2 Postgres

CREATE PUBLICATION pub_name FOR ALL TABLES;

3) Create Replication Subscription on Aurora Postgres

CREATE SUBSCRIPTION sub_name
CONNECTION 'host=EC2-IP-ADDRESS-HERE port=5432 dbname=pgdb user=rds_replication password=develop3r'
PUBLICATION pub_name;

Once the subscription is created Aurora will connect to the EC2 Postgres instance and Logical Replication will commence. The Logical Replication will transpire in two phases: (i) an initial table synchronization occurs as a transaction wrapped COPY statement to pull over the tables current data then, (ii) change data capture occurs whereby on going writes are propagated to the subscriber of the Logical Replication.

There are a few gotchas worth mentioning here.

If you are migrating a large amount of data you will want to increase the max_sync_workers_per_subscription parameter of the Aurora Cluster Parameter group along with the max_worker_processes and max_logical_replication_workers. See the Postgres docs for more details.
Logical Replication does not replicate Sequences which typically back primary keys. This can be mitigated by updating the start values for each sequence immediately after turning off Logical Replication and before the final Heroku App is switched to Aurora.
The initial table synchronization step may fail one or more times for very large tables (10s to 100s of millions of rows and up) but, it will automatically restart and in my experience they have always succeeded.
Also be sure to thoroughly read the Postgres Logical Replication Docs

Switch App Dyno to Aurora Postgres

Moving on to the home stretch. In this section you wait until the replication is fully synchronized and replication lag is minimized to an acceptable level (ideally zero) then kill the subscription, update the sequences, and switch the Heroku app to Aurora Postgres.

Procedure for Finalizing Logical Replication and Switching Heroku App to Aurora

1) Evaluating Logical Replication Status

Postgres provides a view named pg_stat_replication which provides useful information on the state of on going replication whereby one row is dedicated to each replication event. In this demo example there are two entries in the table (i) one for the Streaming Replica and (ii) another for the Logical Replication to Aurroa. Probably the most important rows fields of this view are write_lag, flush_lag, and replay_lag which show the amount of time in which replication is falling behind. The ideal values for these fields are null which mean there is no lag.

So on the Master EC2 Postgres server you can query this table to evaluate the progress.

SELECT * FROM pg_stat_replication;

You should also check the postgresql log from Aurora in CloudWatch and make sure all the tables finish their initial synchronization phase.

2) Once Replication is Sync'd Switch Heroku App to Maintenance Mode

heroku maintenance:on -a app-name

3) Drop the Subscription on Aurora

DROP SUBSCRIPTION sub_name;

4) Alter the Sequences on Aurora

As I mentioned previously Logical Replication does not properly migrate sequences so if a new record were to be attempted to be inserted into one of the tables on Aurora a collision error will occur because all the primary key values will be starting at 1 even though table data already exists. I alleviate this problem by using a SQL function to query each primary key's tables (that utilize a sequence) to find the max primary key value and restart the sequence value to that max value plus some offset.

For this demo app I have the following SQL script named alter_sequences.sql which accomplishes just that.

CREATE OR REPLACE FUNCTION alter_sequence(seq TEXT, select_sql TEXT) RETURNS void
    LANGUAGE plpgsql
    PARALLEL UNSAFE
    AS $$
DECLARE
  max_pk INTEGER;
  new_seq_id INTEGER;
  seq_id_offset INTEGER := 9000;
BEGIN
  EXECUTE select_sql INTO max_pk;
  new_seq_id = max_pk + seq_id_offset;
  RAISE NOTICE 'Max PK=% and new Seq Val=% from SQL=%', max_pk, new_seq_id, select_sql;

  PERFORM setval(seq, new_seq_id);
END; $$;


BEGIN;

SELECT alter_sequence('public.auth_group_id_seq', 'SELECT MAX(id) FROM  public.auth_group');
SELECT alter_sequence('public.auth_group_permissions_id_seq', 'SELECT MAX(id) FROM public.auth_group_permissions');
SELECT alter_sequence('public.auth_permission_id_seq', 'SELECT MAX(id) FROM public.auth_permission');
SELECT alter_sequence('public.auth_user_groups_id_seq', 'SELECT MAX(id) FROM public.auth_user_groups');
SELECT alter_sequence('public.auth_user_id_seq', 'SELECT MAX(id) FROM public.auth_user');
SELECT alter_sequence('public.auth_user_user_permissions_id_seq', 'SELECT MAX(id) FROM public.auth_user_user_permissions');
SELECT alter_sequence('public.core_randonumba_id_seq', 'SELECT MAX(id) FROM public.core_randonumba');
SELECT alter_sequence('public.django_admin_log_id_seq', 'SELECT MAX(id) FROM public.django_admin_log');
SELECT alter_sequence('public.django_content_type_id_seq', 'SELECT MAX(id) FROM public.django_content_type');
SELECT alter_sequence('public.django_migrations_id_seq', 'SELECT MAX(id) FROM public.django_migrations');

COMMIT;

Then run as follows.

psql -U pguser -d pgdb -h AURORA-WRITER-CLUSTER-HERE -f alter_sequences.sql

5) Update Heroku App's AURORA_MASTER_DATABASE_URL Config Variable

heroku config:set AURORA_MASTER_DATABASE_URL=postgres://pguser:develop3r@AURORA-WRITER-CLUSTER-HERE -a app-name
heroku ps:restart -a app-name

6) Turn off Maintenance Mode

heroku maintenance:off -a app-name

Conclusion

In this article I have walked through a game plan that can serve as a guide for migrating a Heroku Postgres Database from the Heroku platform to AWS Aurora complete with fault tolerance failover and near zero downtime.

As always, thanks for reading and please do not hesitate to critique or comment below.

AWS PostgreSQL Databases CentOS Linux DevOps

Share with friends and colleagues

[[ likes ]] likes