Automating PostgreSQL hot standby DB resynch after failover

In a similar fashion to Oracle, the PostgreSQL database offers a DR solution that enables data replication to a standby server. A number of options exist from logical data replication, log shipping and real-time data streaming to a physical hot standby. The later enables read only access to a mirror copy of the master database, known as query offload, which has the added advantage of providing failover to a standby database should the master fail.

One area that is often overlooked is; when the standby becomes the master, it cannot automatically return to being a standby database. In fact, if the former master is still online, we have a split-brain situation where an application can read-write to two master databases. It is therefore essential that the former master is shutdown when performing failover testing.

The Ansible playbook discussed in this post overcomes this problem, by resynchronizing the former master database with the new master, using the pg_rewind utility, which synchronizes a PostgreSQL data directory with another data directory and thus allows the hot standby database to be recreated.

Promoting a PostgreSQL standby database to be a master is easy. Simply touch the “trigger” file specified in the recovery.conf file on the standby server to automate the failover. If the standby is synchronized with the master, it will immediately become the new master, the trigger file will be deleted and the recovery.conf file will be renamed to recovery.done. Now we can make the former master a hot standby…

The steps

The automated steps executed by the postgres-cluster-switchover-to-standby.yml Yaml script are as follows:

  1. Stop the postgres service on the current primary (master) server
  2. Promote the standby database to become a primary by touching the trigger_file on current standby server
  3. Configure the former standby database parameter wal_log_hints in postgresql.conf
  4. Enable postgres user host based access in pg_hba.conf on former standby (new primary) database
  5. Restart the postgres service on the former standby (new primary) server
  6. Execute the pg_rewind utility on the former primary (new standby) server to synchronize with the new primary database
  7. Configure the former primary (new standby) database parameters archive_mode, max_wal_senders and wal_log_hints parameters in postgresql.conf
  8. Configure the former standby (new primary) database parameters archive_command in postgresql.conf
  9. Configure the former standby (new primary) database. Disable postgres user host based access in pg_hba.conf
  10. Configure the former standby (new primary) database. Enable replication user host based access in pg_hba.conf
  11. Restart the postgres service on former standby (new primary) server
  12. Wait for the new standby database to synchronize
  13. Rename recovery.done to recovery.conf on the former primary (new standby) database
  14. Configure the former primary (new standby) database primary_conninfo parameter in recovery.conf
  15. Configure the former primary (new standby) database max_wal_senders and wal_log_hints parameters in postgresql.conf
  16. Restart the postgres service on former primary (new standby) server
  17. Perform database recovery check on new standby database

The playbook

---
- hosts: primary

# YAML script: postgres-cluster-switchover-to-standby.yml
 # Usage : ansible-playbook postgres-cluster-switchover-to-standby.yml --extra-vars "db_name=postgres db_repuser=repuser db_rep_password=repuser123 db_port=5321 db_data_dir=/var/lib/pgsql/9.5/data primary_ip=192.168.0.191 standby_ip=192.168.0.101"

pre_tasks:
 - assert: { that: "{{item}} is defined" }
 with_items:
 - db_name
 - db_repuser
 - db_rep_password
 - db_port
 - primary_ip
 - db_data_dir

 tasks:
 
 - name: Stopping the postgres service on the current master
 service: name=postgresql-9.5 state=stopped sleep=30

- hosts: standby
 tasks:

 # Initialising the global variable
 - shell: chdir="{{ db_data_dir }}" grep trigger_file recovery.conf | awk '{print $3}' | sed s/\'//g 
 register: triggerfile
 ignore_errors: True
 - debug:
 var: triggerfile.stdout
 
 - name: Initiating switchover to standby. Promoting to master
 file: path={{ triggerfile.stdout }} state=touch
 become: true
 become_user: postgres
 
- hosts: primary
 tasks:

 - name: Enabling wal_log_hints in postgresql.conf on former master
 replace: dest={{ db_data_dir }}/postgresql.conf regexp="#?wal_log_hints\s+=\s+[o].*" replace="wal_log_hints = on"
 become: true
 become_user: postgres

 - name: Starting the postgres service on former master
 service: name=postgresql-9.5 state=started

 - name: Stopping the postgres service on former master
 service: name=postgresql-9.5 state=stopped

- hosts: standby
 tasks:

 - name: Configuring former master DB. Enabling postgres user access in pg_hba.conf
 shell: > 
 if [ $( cat $db_data_dir/pg_hba.conf | grep $standby_ip | wc -l ) -eq 0 ]; then
 echo "host postgres postgres {{ primary_ip }}/32 trust" >> {{ db_data_dir }}/pg_hba.conf;
 fi
 become: true
 become_user: postgres 

 - name: Restarting the postgres service on former standby
 service: name=postgresql-9.5 state=restarted

- hosts: primary
 tasks:
 
 - name: Executing the pg_rewind utility on former master ..
 shell: /usr/pgsql-9.5/bin/pg_rewind --target-pgdata=/var/lib/pgsql/9.5/data --source-server="host={{ standby_ip }} port={{ db_port }} user=postgres dbname={{ db_name }}"
 become: true
 become_user: postgres

- hosts: standby
 tasks:

 - name: Configuring former master DB. Configuring archive_mode, max_wal_senders and wal_log_hints parameters in postgresql.conf
 replace: dest={{db_data_dir}}/postgresql.conf regexp={{ item.src }} replace={{ item.tgt }}
 with_items:
 - { src: '^#?archive_mode\s+=\s+[o][n]?[ff]?', tgt: 'archive_mode = on' }
 - { src: '^max_wal_senders\s+=\s+.*', tgt: 'max_wal_senders = 3' }
 - { src: '^wal_log_hints\s+=\s+[o].', tgt: '#wal_log_hints = on' }
 become: true
 become_user: postgres

 - name: Configuring former standby DB. Adding archive_command config to postgresql.conf
 replace: dest={{db_data_dir}}/postgresql.conf regexp="^#?archive_command\s+=\s+\'.*\'" replace="archive_command = 'cp -i %p {{ db_data_dir }}/archive/%f'"
 become: true
 become_user: postgres

 - name: Configuring former standby DB. Disabling postgres user access in pg_hba.conf
 replace: dest={{db_data_dir}}/pg_hba.conf regexp="^host\s+postgres\s+postgres\s+.*/32\s+trust" replace="#host postgres postgres {{ primary_ip }}/32 trust"
 become: true
 become_user: postgres 

 - name: Configuring former standby DB. Enabling replication user access in pg_hba.conf
 replace: dest={{db_data_dir}}/pg_hba.conf regexp="^host\s+replication\s+.*\s+.*/32\s+md5" replace="host replication {{ db_repuser }} {{ primary_ip }}/32 md5"
 become: true
 become_user: postgres 

 - name: Restarting the postgres service on former standby
 service: name=postgresql-9.5 state=restarted

- hosts: primary

 post_tasks:
 - name: Perform database recovery check
 shell: tail -2 $(ls -1rt | tail -1) | grep "database system is ready to accept read only connections" chdir={{db_data_dir}}/pg_log
 register: db_is_in_recovery
 ignore_errors: true
 become: true
 become_user: postgres
 
 - fail: msg="Standby DB creation failed - db_is_in_recovery.stderr"
 when: db_is_in_recovery.stderr != ""

 handlers:
 - name: restart-postgres
 service: name=postgresql-9.5 state=restarted

 tasks:
 - name: Waiting for standby DB to synchronise ..
 wait_for: path={{ db_data_dir }}/recovery.done timeout=180

 - name: Configuring former master DB. Renaming recovery.done to recovery.conf
 shell: mv {{ db_data_dir }}/recovery.done {{ db_data_dir }}/recovery.conf
 become: true
 become_user: postgres

 - name: Configuring former master DB. Enabling replication user access in recovery.conf
 replace: dest={{db_data_dir}}/recovery.conf regexp="^primary_conninfo\s+=\s+'host=.*\s+port=.*\s+user=.*\s+password=.*" replace="primary_conninfo = 'host={{ standby_ip }} port={{ db_port }} user={{ db_repuser }} password={{ db_rep_password }}'"
 become: true
 become_user: postgres 

 - name: Configuring former master DB. Configuring max_wal_senders and wal_log_hints parameters in postgresql.conf
 replace: dest={{db_data_dir}}/postgresql.conf regexp={{ item.src }} replace={{ item.tgt }}
 with_items: 
 - { src: '^max_wal_senders\s+=\s+.*', tgt: 'max_wal_senders = 3' }
 - { src: '^wal_log_hints\s+=\s+[o].', tgt: '#wal_log_hints = on' }
 become: true
 become_user: postgres

 - name: finished
 shell: date
 notify: 
 - restart-postgres

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Leave a Reply

Your email address will not be published. Required fields are marked *