PostgreSQL High-Availability using Debian

N.B. This is a Works In Progress (WIP) and the instructions are not complete! Bear in mind, I am running by the seat of my pants on this one and it's only something we have recently begun to use. I more than likely have overlooked something and will update this page as and when I am bitten by my own stupidity.

At my workplace I needed to configure a warm-standby for an SQL backend to our FreeRADIUS installation. Digging around online I could not find any definitive sets of articles on how to do this sort of thing so I decided to document up our deployment based on Debian 'lenny'.

At first I wanted to try and get Warm Standby (Point-In-Time-Recovery) working but this proved to be rather tricky so I went for the much simpler drbd (aka RAID1-over-TCP)

This document will cover how to deal with priming:

Important Notes

In an ideal world you would have two seperate VLAN's for each of your PostgreSQL boxes to sit in and a serial cable between the two to act as an out-of-bound (not over the network) heartbeat. The PostgreSQL servers would then provide an active/standby service on a unique IP address.

There are two problems with this:

For our needs, the boxes could not be in the same location 1 but fortunately we have a rather overly resilent network that can give high-availability for our VLAN's so we have compromised.

Our setup is thus a single VLAN with the network routing statically configured to route the PostgreSQL service '/32' IP into that VLAN. We use HSRP for a HA Layer-3 default gateway for the VLAN, you might prefer something else. The example I describe below makes this unique IP lurk in the same VLAN as the PostgreSQL boxes lurk in, for us that is not the case but it does not really matter.

Naming and Addressing

Throughout this document I will be using the following conventions.

The network we will be working with is '10.0.0.0/24' with the default gateway lurking at '10.0.0.1'. The HA PostgreSQL service address is '10.0.0.4' ('sql.example.com') and it is provisioned by the two real PostgreSQL servers:

Both boxes use link aggregation to provide a resilient uplink to the switch they are connected to, this is why throughout the document you will see 'bond0' referred to where you would normally see 'eth0'.

Instructions

Prerequisites

So to start off by installing the packages we need:

# aptitude install postgresql libpam_ldap heartbeat drbd8-utils drbd8-modules

Configuring LDAP Authentication

Make sure you can use your system wide LDAP library ('ldapsearch' for example) to make TLS/SSL requests with certificate verification working. If it does not, you will see when running tcpdump/tshark SSL "Encrypted Alert"'s terminating the LDAP communications appearing in your packet captures which means you have not properly set this up. Also if you do not do this then you will find your passwords shooting across around your network in plaintext.

# grep -v "^#" /etc/pam_ldap.conf
host ldap1.example.com ldap2.example.com
base dc=example,dc=com
ldap_version 3
scope sub
bind_policy soft
pam_filter objectClass=inetOrgPerson
pam_login_attribute cn
ssl start_tls

# cat /etc/pam.d/postgresql
auth    required        pam_ldap.so
account required        pam_ldap.so

# cat /etc/postgresql/8.3/main/pg_hba.conf
[snipped]
host    all         all         127.0.0.1/32          pam postgresql
host    all         all         ::1/128               pam postgresql
host    all         all         <your-cidr>           pam postgresql

# su - postgres
$ createuser <your-username>
Shall the new role be a superuser? (y/n) y

You should now be able to login using your usual LDAP backend'ed credentials.

Disable Auto Starting of PGSQL

Stop and configure PostgreSQL to only run with 'manual' intervention (ie. not via the 'init.d' scripts):

# /etc/init.d/postgresql-8.3 stop
# grep -v "^#" /etc/postgresql/8.3/main/start.conf
manual

Configure Heartbeat

On both boxes identically configure the following files as shown:

# grep -v "^#" /etc/heartbeat/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
deadtime 45
mcast bond0 225.0.0.1 694 1 0
auto_failback off
node sql1
node sql2
ping 10.0.0.1

N.B. we use a 'deadtime' of 45 seconds as the settle down time of spanning-tree is about 30 seconds so we want to avoid the nasty case of where there is a spanning-tree juggle at which point the failover gets kicked about.

N.B. here I suggest 'auto_failback' is turned off so to reduce the downtime of your PostgreSQL instance as it moves back to the 'primary'. If you have more than one HA service you want to split between the two boxes (we tether syslog-ng to the other box and have the two HA services running over the same heartbeat) then you might want to turn this on to act as a very simple load-balancer. This means the load produced by postgresql will not affect any service tethered to the other box.

You now need to download my OCF script PgSQL and place it in '/etc/heartbeat/resource.d/', remembering to make it executable.

# grep -v "^#" /etc/heartbeat/haresources
sql1  drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/postgresql::ext3::relatime,nodev,nosuid,noexec PgSQL::5432 IPaddr::10.0.0.4/24/bond0

N.B. you might be curious why only 'sql1' is referred to in 'haresources'. This is because 'sql1' is the typical primary and so if the heartbeat daemons are running on both 'sql1' and 'sql2' then 'sql1' is the box that actually has the IP '10.0.0.4' and PostgreSQL running. If both are referred to, then bad things happen!

# grep -v "^#" /etc/heartbeat/authkeys
auth 1
1 sha1 <big-old-hunky-random-secret>

DRBD

Before I go on to explain how to set this up, you need to be aware that the 'postgres' UID and GID must be identical on both your SQL servers. If it is not then PostgreSQL on the secondary/passive box will not startup as it will not have permission to write to the filesystem or read it's database files. If they currently mis-match I recommend you change both to new numbers by editing your '/etc/passwd' and '/etc/group' files directly and then run:

# find / -uid <old-uid-number> -exec chown <new-uid-number> '{}' \;
# find / -gid <old-gid-number> -exec chgrp <new-gid-number> '{}' \;

You need to tweak your '/etc/drbd.conf' file appropriately (to be identical on both boxes):

# grep -v "^#" /etc/drbd.conf
global {
  # out SQL servers do not have direct Internet access
  usage-count no;
}

common {
  # disks cap out at 50ish for us
  syncer { rate 40M; }
}

resource r0 {
  protocol C;
  
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
    outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
  }

  startup {
    degr-wfc-timeout 120;
  }

  disk {
    on-io-error   detach;
    fencing resource-only;
  }

  net {
    cram-hmac-alg "sha1";
    shared-secret "<big honking secret here>";
  
    # these are worth changing, I have not decided what to though
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  on sql1 {
    device             /dev/drbd0;
    disk               /dev/lvm-sql1/drbd-pgsql;
    address            10.0.0.2:7788;
    flexible-meta-disk internal;
  }
  on sql2 {
    device             /dev/drbd0;
    disk               /dev/lvm-sql2/drbd-pgsql;
    address            10.0.0.3:7788;
    flexible-meta-disk internal;
  }
} 

So DRDB can use the existing heartbeat infrastructure between your two SQL servers. Type on both:

# chgrp haclient /sbin/drbdsetup
# chmod o-x /sbin/drbdsetup
# chmod u+s /sbin/drbdsetup
# chgrp haclient /sbin/drbdmeta
# chmod o-x /sbin/drbdmeta
# chmod u+s /sbin/drbdmeta

# chgrp haclient /etc/drbd.conf
# chmod g+r /etc/drbd.conf

Now we need to prime the DRDB bits and move the postgresql data spool over, on both type:

# /etc/init.d/heartbeat stop
# /etc/init.d/postgresql-8.3 stop
# drbdadm create-md r0
# /etc/init.d/drbd start

# pg_dropcluster 8.3 main
# echo "
> # drbd so is *not* automounted
> /dev/drbd0      /var/lib/postgresql     ext3    noauto,relatime,nodev,nosuid,noexec      0  0" >> /etc/fstab

You need to tweak postgresql to listen on all interfaces on your boxes as the IP address is added to the system after postgresql is started, and more importantly removed before it's shutdown. So on both systems:

# grep listen_address /etc/postgresql/8.3/main/postgresql.conf
listen_addresses = '10.0.0.4'

You should now have a '/dev/drbd0' device that you can treat like a regular block device. So, create a filesystem on it by typing on 'sql1':

# drbdadm -- --overwrite-data-of-peer primary r0
# mkfs.ext3 -L pgsql /dev/drbd0
# mount /var/lib/postgresql
# pg_createcluster 8.3 main
# ln -s /etc/postgresql/8.3/main/postgresql.conf /var/lib/postgresql/8.3/main/postgresql.conf
# umount /var/lib/postgresql

Now hopefully, if everything is correctly set up (and I have not forgotten to document anything) you should be able to fire up the heartbeat daemon on both 'sql1' and 'sql2' then after about a minute you should have a pgsql instance running on '10.0.0.4'.

If things do not work, the only difference between the two boxes really should be just in the '/etc/heartbeat/haresources' file. You might want to check all configuration files are identical on both boxes before grumbling at me.

  1. although in central London it seems my organisation is unable to provision a reliable 220V feed :-/ (1)

www: pgsql-ha (last edited 2009-08-07 12:09:39 by alex)