In this article I demonstrate how to setup a log shipped replica of a Postgres v11 master database creating a warm standby replica for High Availability and read scaling utilizing the WAL-E program. WAL-E is a Python based program designed to facilitate performing continuous archiving of PostgreSQL WAL files and base backups.
For this article I will be using two separate Linux based Cloud Virtual Private Servers (AWS Linux AMI 2 specifically but both Ubuntu and CentOS can be used interchangably) along with an AWS S3 bucket.
In this section I walk though installing and configuring the AWS Linux AMI 2 based master server installing the PostgreSQL and WAL-E dependencies, build and populate the master database with source data, and initiate continuous archving to an AWS S3 bucket. Note that although I am using the AWS Linux AMI distro the process is essentially identical for Ubuntu and CentOS / Fedora and I link to other articles on The Coding Interface that detail how to install on those distros.
First I update the newly launched Linux server and install the necessary linux packages for Postgres and the WAL-E dependencies.
sudo yum update -y
Installing Postgres (see link below for instructions on installing on Ubuntu / Centos).
sudo amazon-linux-extras install epel -y sudo amazon-linux-extras install postgresql11 -y sudo yum install postgresql-server postgresql-contrib -y
For instructions specific to installing PostgreSQL on Ubuntu and CentOS see my earlier article here on The Coding Interface, Developer's Guide to PostgreSQL on Linux: Installation
Install Python3 and PIP as well as WAL-E and the very useful envdir for working with file based environment variables.
sudo yum install python3 lzop pv -y curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py sudo python3 get-pip.py sudo python3 -m pip install envdir sudo python3 -m pip install wal-e[aws]
Initialize, start and enable Postgres.
sudo su - # become root user su - postgres pg_ctl init exit systemctl start postgresql systemctl enable postgresql
Create a Postgres user and database for interacting with the master source database.
su - postgres createuser -e -l --pwprompt pguser createdb -e --owner=pguser pgdb
Create some functions and stored procedures for generating test data to work with in the replication process.
CREATE OR REPLACE FUNCTION random_between(low INT, high INT) RETURNS INT AS $$ BEGIN RETURN floor(random()* (high-low + 1) + low); END; $$ language 'plpgsql' STRICT; CREATE OR REPLACE FUNCTION random_letter() RETURNS TEXT AS $$ DECLARE letter_idx INTEGER := 0; letters TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'; BEGIN SELECT random_between(0, LENGTH(letters) - 1) INTO letter_idx; RETURN SUBSTRING(letters, letter_idx, 1); END; $$ language 'plpgsql' STRICT; CREATE TABLE IF NOT EXISTS random_numbers ( id serial PRIMARY KEY, number INTEGER NOT NULL, created_at TIMESTAMP NOT NULL ); CREATE TABLE IF NOT EXISTS random_letters ( id serial PRIMARY KEY, letter VARCHAR(1) NOT NULL, created_at TIMESTAMP NOT NULL ); CREATE OR REPLACE PROCEDURE generate_randomness() language plpgsql AS $$ DECLARE num INTEGER; BEGIN SELECT random_between(0, 100) INTO num; INSERT INTO random_numbers (number, created_at) SELECT random_between(0, 100), NOW() FROM generate_series(1, num) as s; INSERT INTO random_letters (letter, created_at) SELECT random_letter(), NOW() FROM generate_series(1, num) as s; COMMIT; END; $$
I placed the above functions and stored procedure into a file called initdb.sql and loaded it into the database like so.
psql -U pguser -d pgdb -f initdb.sql
Create a cron task as postgres user to generate random numbers and letter data every minute.
*/1 * * * * psql -U pguser -d pgdb -c "CALL generate_randomness();"
In order for WAL-E to interact with S3 an IAM user with programmatic access is required. I've created a IAM user in my AWS account named wal-e with S3FullAccess permission and gathered up the access key ID and secret. See AWS IAM Docs if you are unsure how to do this.
I also setup a S3 bucket named tci-wal-e-demo in the US-WEST (Oregon) region to work with.
Now over on the Master Linux server I create a directory for containing my IAM user credentials and S3 bucket info in the form of [Python envdir](https://envdir.readthedocs.io/en/latest/api.html) environment variable files as shown below.
Setup S3 Bucket in AWS for Containing Backup Data. I've named my bucket tci-wal-e-demo
With the IAM user and S3 bucket created I can configure WAL-E along with envdir back on the master server.
sudo mkdir -p /etc/wal-e/env sudo chown -R postgres:postgres /etc/wal-e sudo su - # become root user su - postgres # become postgres user cd /etc/wal-e/env echo "PUT-YOUR-ACCESS-KEY-ID-HERE" > AWS_ACCESS_KEY_ID echo "PUT-YOUR-ACCESS-KEY-SECRET-HERE" > AWS_SECRET_ACCESS_KEY echo "YOUR-BUCKET-REGION" > AWS_REGION echo "s3://YOUR-BUCKET-NAME/dbname" > WALE_S3_PREFIX
This is what the /etc/wal-e/env directory should look like now.
ls -l total 16 -rw-r--r-- 1 postgres postgres 21 Oct 21 15:32 AWS_ACCESS_KEY_ID -rw-r--r-- 1 postgres postgres 10 Oct 21 15:32 AWS_REGION -rw-r--r-- 1 postgres postgres 41 Oct 21 15:32 AWS_SECRET_ACCESS_KEY -rw-r--r-- 1 postgres postgres 20 Oct 21 15:32 WALE_S3_PREFIX
At this point I can configure the Master Postgres database instance to perform continuous WAL archiving and generate a Base Backup with WAL-E (run as postgres user)
First I update /var/lib/pgsql/data/postgresql.conf settings
wal_level = replica archive_mode = on archive_command = 'envdir /etc/wal-e/env wal-e wal-push %p' archive_timeout = 60
This requires a restart of the postgresql service to take effect
systemctl restart postgresql
Now I push a base backup (as postgres user) utilizing the following wal-e command. However, I have found that in production systems it is best to wrap this command in a shell script and run the shell script with nohup as a background job because the process can take a fair amount of time and it's possible that the ssh connection to the server may fail.
envdir /etc/wal-e/env wal-e backup-push /var/lib/pgsql/data
That being said I create the following shell script named wal-e-push.sh with the following contents.
#!/bin/bash echo "starting wal-e backup-push" envdir /etc/wal-e/env wal-e backup-push /var/lib/pgsql/data echo "wal-e backup-push complete"
Then make the file executable and kick it off using nuhup as so.
chmod +x wal-e-push.sh nohup ./wal-e-push.sh &
I can then cat or tail the nohup.out script watching for the completed message to appear indicating the backup has been successfully pushed.
$ cat nohup.out starting wal-e backup-push wal_e.main INFO MSG: starting WAL-E DETAIL: The subcommand is "backup-push". STRUCTURED: time=2020-10-21T16:43:13.441974-00 pid=13164 wal_e.operator.backup INFO MSG: start upload postgres version metadata DETAIL: Uploading to s3://tci-wal-e-demo/pgdb/basebackups_005/base_000000010000000000000002_00000040/extended_version.txt. STRUCTURED: time=2020-10-21T16:43:14.619558-00 pid=13164 wal_e.operator.backup INFO MSG: postgres version metadata upload complete STRUCTURED: time=2020-10-21T16:43:14.675762-00 pid=13164 wal_e.worker.upload INFO MSG: beginning volume compression DETAIL: Building volume 0. STRUCTURED: time=2020-10-21T16:43:14.764043-00 pid=13164 wal_e.worker.upload INFO MSG: begin uploading a base backup volume DETAIL: Uploading to "s3://tci-wal-e-demo/pgdb/basebackups_005/base_000000010000000000000002_00000040/tar_partitions/part_00000000.tar.lzo". STRUCTURED: time=2020-10-21T16:43:15.193014-00 pid=13164 wal_e.worker.upload INFO MSG: finish uploading a base backup volume DETAIL: Uploading to "s3://tci-wal-e-demo/pgdb/basebackups_005/base_000000010000000000000002_00000040/tar_partitions/part_00000000.tar.lzo" complete at 18202.3KiB/s. STRUCTURED: time=2020-10-21T16:43:15.520506-00 pid=13164 NOTICE: pg_stop_backup complete, all required WAL segments have been archived wal-e backup-push complete
Afterwards I verify backups are on S3 using the wal-e backup-list command like so.
$ envdir /etc/wal-e/env wal-e backup-list wal_e.main INFO MSG: starting WAL-E DETAIL: The subcommand is "backup-list". STRUCTURED: time=2020-10-21T16:44:23.915389-00 pid=13218 name last_modified expanded_size_bytes wal_segment_backup_start wal_segment_offset_backup_start wal_segment_backup_stop wal_segment_offset_backup_stop base_000000010000000000000002_00000040 2020-10-21T16:43:17.000Z 000000010000000000000002 00000040
Ok now a separate Linux server (again I'm using the AWS Linux 2 AMI but Ubuntu / CentOS can be treated the same except where I mention otherwise) I can set it up as a log shipped replica in hot standby mode. Warm standby could also be done, the only difference is that hot standby allows for read only connections while warm standby don't allow client connections.
To start I update the server and install Postgres 11.
sudo yum update -y sudo amazon-linux-extras install epel postgresql11 -y sudo yum install postgresql-server postgresql-contrib -y
Again, see Developer's Guide to PostgreSQL on Linux: Installation for specifics on installing on Ubuntu or Centos
Install Python3, PIP, envdir and WAL-E. Note that in the replica I don't have to install the pv program since thats only required for generating the base backup in the master server.
sudo yum install python3 lzop -y curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py sudo python3 get-pip.py sudo python3 -m pip install envdir wal-e[aws]
Next I configure WAL-E in the same manner I did for the master.
sudo mkdir -p /etc/wal-e/env sudo chown -R postgres:postgres /etc/wal-e sudo su - su - postgres cd /etc/wal-e/env echo "PUT-YOUR-ACCESS-KEY-ID-HERE" > AWS_ACCESS_KEY_ID echo "PUT-YOUR-ACCESS-KEY-SECRET-HERE" > AWS_SECRET_ACCESS_KEY echo "YOUR-BUCKET-REGION" > AWS_REGION echo "s3://YOUR-BUCKET-NAME/dbname" > WALE_S3_PREFIX
Verify WAL-E is setup by listing available backups
envdir /etc/wal-e/env wal-e backup-list --detail
should see output like this
wal_e.main INFO MSG: starting WAL-E DETAIL: The subcommand is "backup-list". STRUCTURED: time=2020-10-21T17:12:48.605932-00 pid=12751 name last_modified expanded_size_bytes wal_segment_backup_start wal_segment_offset_backup_start wal_segment_backup_stop wal_segment_offset_backup_stop base_000000010000000000000017_00000040 2020-10-20T05:28:04.000Z 33969406 000000010000000000000017 00000040 000000010000000000000017 00029456
For a RHEL based system the default Postgres Data Directory lives at /var/lib/pgsql/data and as you can see it is empty since Postgres has not yet been initialized or restored from an existing backup.
cd /var/lib/pgsql/data ls -l total 0
I temporarily initialize the Postgres Data Directory so I can gather the config files to be used after restoring the base physical backup.
pg_ctl init # run as postgres user cd /var/lib/pgsql cp /var/lib/pgsql/data/postgresql.conf . cp /var/lib/pgsql/data/pg_hba.conf . cp /var/lib/pgsql/data/pg_ident.conf .
Now I can nuke data directory since it will be replaced by the restored base backup.
rm -r /var/lib/pgsql/data/*
With the default data directory back to pristine, uninitialized condition, I fetch the base backup from S3 using WAL-E with the backup-fetch argument. However, for most production systems this command may take a while to execute since it's potentially pulling large amounts of data from S3 and extracting it into the specified directory I wrap the command in a shell script so I can run it with the nohup utility as a background job protecting against a SSH shell connection failure.
#!/bin/bash echo "Starting wal-e backup-fetch" envdir /etc/wal-e/env wal-e backup-fetch --blind-restore /var/lib/pgsql/data/ LATEST echo "wal-e backup-fetch complete"
I execute the shell script using nohup as a background jobs like so.
nohup ./wal-e-fetch.sh &
I can then watch for the job to finish by using cat or tail on the nohup.out file watching for the "wal-e backup-fetch complete" message to appear telling me the backup has been successfully fetched and extracted.
As you can see the backup is now fully extracted into the specified data directory.
ls -l /var/lib/pgsql/data
-rw------- 1 postgres postgres 253 Oct 20 05:28 backup_label drwx------ 6 postgres postgres 54 Oct 21 17:14 base -rw------- 1 postgres postgres 30 Oct 20 05:26 current_logfiles drwx------ 2 postgres postgres 4096 Oct 21 17:14 global drwx------ 2 postgres postgres 32 Oct 21 17:14 log drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_commit_ts drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_dynshmem drwx------ 4 postgres postgres 68 Oct 21 17:14 pg_logical drwx------ 4 postgres postgres 36 Oct 21 17:14 pg_multixact drwx------ 2 postgres postgres 18 Oct 21 17:14 pg_notify drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_replslot drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_serial drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_snapshots drwx------ 2 postgres postgres 6 Oct 20 05:26 pg_stat drwx------ 2 postgres postgres 6 Oct 20 05:27 pg_stat_tmp drwx------ 2 postgres postgres 18 Oct 21 17:14 pg_subtrans drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_tblspc drwx------ 2 postgres postgres 6 Oct 20 02:23 pg_twophase -rw------- 1 postgres postgres 3 Oct 20 02:23 PG_VERSION drwx------ 2 postgres postgres 6 Oct 20 05:28 pg_wal drwx------ 2 postgres postgres 18 Oct 21 17:14 pg_xact -rw------- 1 postgres postgres 88 Oct 20 02:23 postgresql.auto.conf
I now copy the previously collected default config files into the newly restored data directory.
cp /var/lib/pgsql/postgresql.conf data/ cp /var/lib/pgsql/pg_hba.conf data/ cp /var/lib/pgsql/pg_ident.conf data/ cd /var/lib/pgsql/data
In postgresql.conf I update the hot_standby config parameter to on like so.
hot_standby = on
Next I create a recovery.conf file in the data directory which specifies how Postgres should restore the database once started.
restore_command = 'envdir /etc/wal-e/env wal-e wal-fetch %f %p' standby_mode = on
Start and enable Postgres via systemd service file (as root)
systemctl start postgresql systemctl enable postgresql
I should now be able to compare queries on the master database to those on the hot standby and see the same (or nearly the same up to 60 seconds of lag).
Additionally, if the master server were to crash I could promote the hot standby with the following command to become the master server enabling it to accept writes.
pg_ctl promote -D /var/lib/pgsql/data
Giving the following output.
waiting for server to promote....... done server promoted
In this article I detailed how to establish a log shipped replica as a hot standby in a Master / Replica Postgres version 11 enviroment. I should say that I've tested and verfied this same procedure will work for versions 8.6, 9, 10, 11 and I presume it will work for version 12 also.
As always, thanks for reading and please do not hesitate to critique or comment below.