Ansible: yum with_items gotcha

There are a few issues raised on https://github.com/ansible/ansible/issues concerning a bug in which Ansible 2.1 reports a task to be OK when it in fact fails, although Ansible stops and the summary output reports a failed task. This is reproducible when using the yum module with a with-loop that lists RPMs for installation as shown in the example below.

vars:
 packages: ['binutils','compat-libstdc++-33','elfutils-libelf','elfutils-libelf-devel','expat','gcc','gcc-c++','glibc','glibc-common','glibc-devel','glibc-headers','libaio','libaio-devel','libgcc','libstdc++','libstdc++-devel','make','sysstat','unixODBC','unixODBC-devel','kmod-oracleasm','openssh-clients','compat-libcap1','compat-libstdc++-33']

tasks:
- name: Installing oracle required RPMs..
 yum:
 name: "{{ item }}"
 state: present
 with_items:
 - "{{ packages }}"

I stumbled across this gotcha when installing Oracle 12c Grid Infrastructure and couldn’t understand why my playbook stopped when the task execution seemingly succeeded.

I discovered the root cause by substituting a shell command to call yum.

- name: Installing oracle required RPMs..
 shell: yum install "{{ item }}"
 with_items:
 - "{{ packages }}"

Despite the “[WARNING]: Consider using yum module rather than running yum“, I found that a few packages in the long list of Oracle required RPMs, did not exist in the local yum repository!

The workaround

Removing the “bad” RPMS from the list, enabled the task to succeed when using the yum with_items.

The solution

Details of the permanent fix can be found here:
https://github.com/ansible/ansible/commit/20726b94be64b17829f5afb712b4b5f542515229

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Ansible: Populating variables with results from a database query

In a recent post I explained the use of command substitution in Ansible, focusing on the stdout and stderr attributes when registering variables and using their contents for server configuration. In this post I share an example of populating Ansible variables from a database query that provides status information that could be used to change a setting or remediate a specific condition.

Querying the database

Here we query a PostgreSQL master database that has an associated hot standby to determine its replication synchronisation status. The SQL is as follows:

select sync_state from pg_stat_replication;

The playbook

If the status is “async” the playbook debug task will fail and echo the current status in a debug message “The standby database is in async replication mode”.

---
- hosts: primary1

pre_tasks:
 - assert: { that: "{{item}} is defined" }
 with_items:
 - db_name
 - db_port

tasks:

- name: Perform database replication sync status check
 shell: >
 psql -p {{db_port}} -d {{db_name}} -U postgres -c "select sync_state from pg_stat_replication;"
 become: true
 become_user: postgres
 register: rep_sync_state

# - debug: var=rep_sync_state

- debug: msg="The standby database is in {{ item[2] }} replication mode"
 with_nested: "{{ rep_sync_state.stdout_lines }}"
 failed_when: "'async' in rep_sync_state.stdout"

Note the use of the “with_nested” parameter to initialise an array, which form the rows returned from the query. In this case, we require the 3rd element ( [2] ) to populate our dynamic variable {{ item }}.

The runtime output

The playbook runtime output is displayed below with the query results highlighted in red.

hot-standby-sync_state2

The debug output

The debug output of the rep_sync_state variable (- debug: var=rep_sync_state ) from the above example displays a wealth of information with all available attributes, as shown below.

hot-standby-sync_state

One step further

Taking the concept one step further, we can query more than 1 table, column or attribute to initialise a two-dimensional array. Then use the data to populate variables and display additional status information as shown in the example below.

hot-standby-sync_state3

The playbook

In order to read each element of the array, we must use the following syntax that includes the split function where we specify the delimiter, in this case a pipe ( | ), plus the required element ( [0] ).

{{ item.split('|')[0] }}
---
- hosts: primary1

pre_tasks:
 - assert: { that: "{{item}} is defined" }
 with_items:
 - db_name
 - db_port

tasks:

- name: Perform database replication sync status check
 shell: >
 psql -p {{db_port}} -d {{db_name}} -U postgres -c "select sync_state, state from pg_stat_replication;"
 become: true
 become_user: postgres
 register: rep_sync_state

- debug: var=rep_sync_state.stdout_lines

- debug: msg="The standby database is in {{ item.split('|')[0] }} mode with {{ item.split('|')[1] }} replication"
 with_items: "{{ rep_sync_state.stdout_lines[2] }}"
 failed_when: "'async' in rep_sync_state.stdout"

PostgreSQL

Fortunately the PostgreSQL psql utility has an “expanded output”  ( \x ) feature that formats the output of a query having multiple fields into 1 record. This provides excellent readability in the stdout debug messages and negates the need to select individual elements from the array as previously described. The playbook and runtime output are shown below.

The playbook

---
- hosts: primary1

 pre_tasks:
 - assert: { that: "{{item}} is defined" }
 with_items:
 - db_name
 - db_port

 tasks:

 - name: Perform database replication sync status check
 shell: "{{ item }}"
 with_items:
 - echo "\x" > /tmp/tmp.sql
 - echo "select * from pg_stat_replication;" >> /tmp/tmp.sql
 become: true
 become_user: postgres

 - shell: cat /tmp/tmp.sql | psql -p {{db_port}} -d {{db_name}} -U postgres
 become: true
 become_user: postgres
 register: rep_sync_state

 - file: dest=/tmp/tmp.sql state=absent

 - debug: var=rep_sync_state.stdout_lines

The runtime output

hot-standby-sync_state4

If anyone has a more elegant way of populating variables or sending multiple commands to a PostgreSQL database without writing a new Ansible module, then please leave a comment.

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Automating PostgreSQL hot standby DB resynch after failover

In a similar fashion to Oracle, the PostgreSQL database offers a DR solution that enables data replication to a standby server. A number of options exist from logical data replication, log shipping and real-time data streaming to a physical hot standby. The later enables read only access to a mirror copy of the master database, known as query offload, which has the added advantage of providing failover to a standby database should the master fail.

One area that is often overlooked is; when the standby becomes the master, it cannot automatically return to being a standby database. In fact, if the former master is still online, we have a split-brain situation where an application can read-write to two master databases. It is therefore essential that the former master is shutdown when performing failover testing.

The Ansible playbook discussed in this post overcomes this problem, by resynchronizing the former master database with the new master, using the pg_rewind utility, which synchronizes a PostgreSQL data directory with another data directory and thus allows the hot standby database to be recreated.

Promoting a PostgreSQL standby database to be a master is easy. Simply touch the “trigger” file specified in the recovery.conf file on the standby server to automate the failover. If the standby is synchronized with the master, it will immediately become the new master, the trigger file will be deleted and the recovery.conf file will be renamed to recovery.done. Now we can make the former master a hot standby…

The steps

The automated steps executed by the postgres-cluster-switchover-to-standby.yml Yaml script are as follows:

  1. Stop the postgres service on the current primary (master) server
  2. Promote the standby database to become a primary by touching the trigger_file on current standby server
  3. Configure the former standby database parameter wal_log_hints in postgresql.conf
  4. Enable postgres user host based access in pg_hba.conf on former standby (new primary) database
  5. Restart the postgres service on the former standby (new primary) server
  6. Execute the pg_rewind utility on the former primary (new standby) server to synchronize with the new primary database
  7. Configure the former primary (new standby) database parameters archive_mode, max_wal_senders and wal_log_hints parameters in postgresql.conf
  8. Configure the former standby (new primary) database parameters archive_command in postgresql.conf
  9. Configure the former standby (new primary) database. Disable postgres user host based access in pg_hba.conf
  10. Configure the former standby (new primary) database. Enable replication user host based access in pg_hba.conf
  11. Restart the postgres service on former standby (new primary) server
  12. Wait for the new standby database to synchronize
  13. Rename recovery.done to recovery.conf on the former primary (new standby) database
  14. Configure the former primary (new standby) database primary_conninfo parameter in recovery.conf
  15. Configure the former primary (new standby) database max_wal_senders and wal_log_hints parameters in postgresql.conf
  16. Restart the postgres service on former primary (new standby) server
  17. Perform database recovery check on new standby database

The playbook

---
- hosts: primary

# YAML script: postgres-cluster-switchover-to-standby.yml
 # Usage : ansible-playbook postgres-cluster-switchover-to-standby.yml --extra-vars "db_name=postgres db_repuser=repuser db_rep_password=repuser123 db_port=5321 db_data_dir=/var/lib/pgsql/9.5/data primary_ip=192.168.0.191 standby_ip=192.168.0.101"

pre_tasks:
 - assert: { that: "{{item}} is defined" }
 with_items:
 - db_name
 - db_repuser
 - db_rep_password
 - db_port
 - primary_ip
 - db_data_dir

 tasks:
 
 - name: Stopping the postgres service on the current master
 service: name=postgresql-9.5 state=stopped sleep=30

- hosts: standby
 tasks:

 # Initialising the global variable
 - shell: chdir="{{ db_data_dir }}" grep trigger_file recovery.conf | awk '{print $3}' | sed s/\'//g 
 register: triggerfile
 ignore_errors: True
 - debug:
 var: triggerfile.stdout
 
 - name: Initiating switchover to standby. Promoting to master
 file: path={{ triggerfile.stdout }} state=touch
 become: true
 become_user: postgres
 
- hosts: primary
 tasks:

 - name: Enabling wal_log_hints in postgresql.conf on former master
 replace: dest={{ db_data_dir }}/postgresql.conf regexp="#?wal_log_hints\s+=\s+[o].*" replace="wal_log_hints = on"
 become: true
 become_user: postgres

 - name: Starting the postgres service on former master
 service: name=postgresql-9.5 state=started

 - name: Stopping the postgres service on former master
 service: name=postgresql-9.5 state=stopped

- hosts: standby
 tasks:

 - name: Configuring former master DB. Enabling postgres user access in pg_hba.conf
 shell: > 
 if [ $( cat $db_data_dir/pg_hba.conf | grep $standby_ip | wc -l ) -eq 0 ]; then
 echo "host postgres postgres {{ primary_ip }}/32 trust" >> {{ db_data_dir }}/pg_hba.conf;
 fi
 become: true
 become_user: postgres 

 - name: Restarting the postgres service on former standby
 service: name=postgresql-9.5 state=restarted

- hosts: primary
 tasks:
 
 - name: Executing the pg_rewind utility on former master ..
 shell: /usr/pgsql-9.5/bin/pg_rewind --target-pgdata=/var/lib/pgsql/9.5/data --source-server="host={{ standby_ip }} port={{ db_port }} user=postgres dbname={{ db_name }}"
 become: true
 become_user: postgres

- hosts: standby
 tasks:

 - name: Configuring former master DB. Configuring archive_mode, max_wal_senders and wal_log_hints parameters in postgresql.conf
 replace: dest={{db_data_dir}}/postgresql.conf regexp={{ item.src }} replace={{ item.tgt }}
 with_items:
 - { src: '^#?archive_mode\s+=\s+[o][n]?[ff]?', tgt: 'archive_mode = on' }
 - { src: '^max_wal_senders\s+=\s+.*', tgt: 'max_wal_senders = 3' }
 - { src: '^wal_log_hints\s+=\s+[o].', tgt: '#wal_log_hints = on' }
 become: true
 become_user: postgres

 - name: Configuring former standby DB. Adding archive_command config to postgresql.conf
 replace: dest={{db_data_dir}}/postgresql.conf regexp="^#?archive_command\s+=\s+\'.*\'" replace="archive_command = 'cp -i %p {{ db_data_dir }}/archive/%f'"
 become: true
 become_user: postgres

 - name: Configuring former standby DB. Disabling postgres user access in pg_hba.conf
 replace: dest={{db_data_dir}}/pg_hba.conf regexp="^host\s+postgres\s+postgres\s+.*/32\s+trust" replace="#host postgres postgres {{ primary_ip }}/32 trust"
 become: true
 become_user: postgres 

 - name: Configuring former standby DB. Enabling replication user access in pg_hba.conf
 replace: dest={{db_data_dir}}/pg_hba.conf regexp="^host\s+replication\s+.*\s+.*/32\s+md5" replace="host replication {{ db_repuser }} {{ primary_ip }}/32 md5"
 become: true
 become_user: postgres 

 - name: Restarting the postgres service on former standby
 service: name=postgresql-9.5 state=restarted

- hosts: primary

 post_tasks:
 - name: Perform database recovery check
 shell: tail -2 $(ls -1rt | tail -1) | grep "database system is ready to accept read only connections" chdir={{db_data_dir}}/pg_log
 register: db_is_in_recovery
 ignore_errors: true
 become: true
 become_user: postgres
 
 - fail: msg="Standby DB creation failed - db_is_in_recovery.stderr"
 when: db_is_in_recovery.stderr != ""

 handlers:
 - name: restart-postgres
 service: name=postgresql-9.5 state=restarted

 tasks:
 - name: Waiting for standby DB to synchronise ..
 wait_for: path={{ db_data_dir }}/recovery.done timeout=180

 - name: Configuring former master DB. Renaming recovery.done to recovery.conf
 shell: mv {{ db_data_dir }}/recovery.done {{ db_data_dir }}/recovery.conf
 become: true
 become_user: postgres

 - name: Configuring former master DB. Enabling replication user access in recovery.conf
 replace: dest={{db_data_dir}}/recovery.conf regexp="^primary_conninfo\s+=\s+'host=.*\s+port=.*\s+user=.*\s+password=.*" replace="primary_conninfo = 'host={{ standby_ip }} port={{ db_port }} user={{ db_repuser }} password={{ db_rep_password }}'"
 become: true
 become_user: postgres 

 - name: Configuring former master DB. Configuring max_wal_senders and wal_log_hints parameters in postgresql.conf
 replace: dest={{db_data_dir}}/postgresql.conf regexp={{ item.src }} replace={{ item.tgt }}
 with_items: 
 - { src: '^max_wal_senders\s+=\s+.*', tgt: 'max_wal_senders = 3' }
 - { src: '^wal_log_hints\s+=\s+[o].', tgt: '#wal_log_hints = on' }
 become: true
 become_user: postgres

 - name: finished
 shell: date
 notify: 
 - restart-postgres

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Command substitution gotcha in Ansible Playbook

Command substitution reassigns the output (stdout) of a command or even multiple commands to a variable. The mechanism is useful for dynamically populating variables based on the environment in which a program is executed.
In the case of cloud orchestration, we may wish to reconfigure memory parameters for an application or service. The example I have chosen is for a PostgreSQL database deployment, where I need to adjust the memory parameters to their optimum value based on the required percentage of total available memory.

For example, among others, I wish to change the work_mem parameter from its default value.

On the Ansible server, I execute a playbook that contains the following code snippet that assigns a value to the “workmem” variable using the “register” Ansible module, and then uses the “replace” module to replace the parameter value in the postgresql.conf file.

# Initialising the global variable
 - shell: echo $(cat /proc/meminfo | grep MemTotal | awk '{print $2}') / 100 / 1024 |bc
 register: workmem
 ignore_errors: True

# Using variable in replace statement
 - name: Configure memory parameters ( work_mem = {{ workmem }}MB )
 replace: dest={{db_data_dir}}/postgresql.conf regexp="^#?work_mem\s+=\s+[1-9]*[kMGT]B" replace="work_mem = {{ workmem }}MB"
 become: true
 become_user: postgres
 
 - name: Restarting the postgres service
 service: name=postgresql-9.5 state=restarted


This results in the following error when Ansible attempts to restart the PostgreSQL instance at the end of the playbook.

TASK [Restarting the postgres service] *****************************************
 fatal: [10.127.3.18]: FAILED! => {"changed": false, "failed": true, "msg": "Stopping postgresql-9.5 service: [ OK ]\r\nStarting postgresql-9.5 service: [FAILED]\r\n"}

NO MORE HOSTS LEFT *************************************************************
 to retry, use: --limit @/home/ansible/yaml/postgres-main2.retry

PLAY RECAP *********************************************************************
 10.127.3.18 : ok=23 changed=7 unreachable=0 failed=1

The gotcha

The register statement is used to store the output of a single task into a variable. However, the shell task will include stdout & stderr, as well as the string returned from the  command. This is visible in the postgresql.conf file we are trying to modify on the target server.

postgres@db_host[~] $ cd /var/lib/pgsql/9.5/data
postgres@db_host[data] $ view postgresql.conf

work_mem = {u'changed': True, u'end': u'2016-10-12 03:15:56.780380', u'stdout': u'9', u'cmd': u"echo $(cat /proc/meminfo | grep MemTotal | awk '{print $2}') / 100 / 1024 |bc", u'start': u'2016-10-12 03:15:56.775608', u'delta': u'0:00:00.004772', u'stderr': u'', u'rc': 0, 'stdout_lines': [u'9'], u'warnings': []}MB

The solution

The variable appears to store an array of values. The solution to this problem is to force Ansible to substitute only the required element, in this case stdout. This is achieved using the following syntax:

- debug:
  var: <variable_name>.stdout

or

{{ <variable_name>.stdout }}

An example of Ansible command substitution from a shell task is shown below:

# Initialising the global variable
 - shell: echo $(cat /proc/meminfo | grep MemTotal | awk '{print $2}') / 100 / 1024 |bc
 register: workmem
 ignore_errors: True
 - debug:
 var: workmem.stdout

# Using variable in replace statement
 - name: Configure memory parameters ( work_mem = {{ workmem.stdout }}MB )
 replace: dest={{db_data_dir}}/postgresql.conf regexp="^#?work_mem\s+=\s+[1-9]*[kMGT]B" replace="work_mem = {{ workmem.stdout }}MB"
 become: true
 become_user: postgres

The correct playbook execution output for provisioning a single instance PostgreSQL database is shown below:

root@ansible_host[yaml] # ansible-playbook /home/ansible/yaml/postgres-main2.yml --extra-vars "target=postgres node_ip=10.127.3.18 db_name=dbdemo db_user=pgadmin db_password=password db_port=5432 db_data_dir=/var/lib/pgsql/9.5/data"

PLAY [postgres] ****************************************************************

TASK [setup] *******************************************************************
 ok: [10.127.3.18]

TASK [assert] ******************************************************************
 ok: [10.127.3.18] => (item=node_ip)
 ok: [10.127.3.18] => (item=db_name)
 ok: [10.127.3.18] => (item=db_user)
 ok: [10.127.3.18] => (item=db_password)
 ok: [10.127.3.18] => (item=db_port)
 ok: [10.127.3.18] => (item=db_data_dir)

TASK [command] *****************************************************************
 changed: [10.127.3.18]

TASK [debug] *******************************************************************
 ok: [10.127.3.18] => {
 "totalmem.stdout": "1018628"
 }

TASK [command] *****************************************************************
 changed: [10.127.3.18]

TASK [debug] *******************************************************************
 ok: [10.127.3.18] => {
 "sharedbuf.stdout": "248"
 }

TASK [command] *****************************************************************
 changed: [10.127.3.18]

TASK [debug] *******************************************************************
 ok: [10.127.3.18] => {
 "workmem.stdout": "9"
 }

TASK [command] *****************************************************************
 changed: [10.127.3.18]

TASK [debug] *******************************************************************
 ok: [10.127.3.18] => {
 "maintworkmem.stdout": "124"
 }

TASK [command] *****************************************************************
 changed: [10.127.3.18]

TASK [debug] *******************************************************************
 ok: [10.127.3.18] => {
 "effectcachesize.stdout": "746"
 }

TASK [Add the group 'postgres'] ************************************************
 ok: [10.127.3.18]

TASK [Add the user 'postgres' and a primary group of 'postgres'] ***************
 ok: [10.127.3.18]

TASK [Intialise the DB as postgres user] ***************************************
 changed: [10.127.3.18]

TASK [Start the DB server as postgres user and enable at boot] *****************
 ok: [10.127.3.18]

TASK [Set Port binding] ********************************************************
 changed: [10.127.3.18]

TASK [Set Interface binding] ***************************************************
 ok: [10.127.3.18]

TASK [Configure memory parameters ( shared_buffers = 248MB )] ******************
 ok: [10.127.3.18]

TASK [Configure memory parameters ( work_mem = 9MB )] **************************
 ok: [10.127.3.18]

TASK [Configure memory parameters ( maintenance_work_mem = 124MB )] ************
 ok: [10.127.3.18]

TASK [Configure memory parameters ( wal_buffers = 64MB )] **********************
 ok: [10.127.3.18]

TASK [Configure memory parameters ( effective_cache_size = 746MB )] ************
 ok: [10.127.3.18]

TASK [Restarting the postgres service] *****************************************
 changed: [10.127.3.18]

TASK [Create database named dbdemo] ********************************************
 ok: [10.127.3.18]

TASK [Setup database user] *****************************************************
 ok: [10.127.3.18]

TASK [Ensure user does not have unnecessary privileges] ************************
 ok: [10.127.3.18]

TASK [Configuring DB remote access in pg_hba.conf] *****************************
 changed: [10.127.3.18]

TASK [Restarting the postgres service] *****************************************
 changed: [10.127.3.18]

TASK [Perform database connection test] ****************************************
 changed: [10.127.3.18]

TASK [debug] *******************************************************************
 skipping: [10.127.3.18]

PLAY RECAP *********************************************************************
 10.127.3.18 : ok=30 changed=11 unreachable=0 failed=0

Another top tip when populating variables in Ansible is the ability to search in the variable data (stdout & stderr) and act on a keyword.

The following post_tasks example shows the db_is_in_recovery variable being populated with the output of a SQL query that checks whether a PostgreSQL hot standby database is in synchronization with its master. The check will fail when “(0 rows)” are returned.

post_tasks:
 - shell: sleep 10

- name: Perform database recovery check
 command: psql -p {{db_port}} -d {{db_name}} -U postgres -c "select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location" where pg_last_xlog_receive_location() = pg_last_xlog_replay_location();"
 become: true
 become_user: postgres
 register: db_is_in_recovery
 ignore_errors: True
 failed_when: "'(0 rows)' in db_is_in_recovery.stdout"
 
 - debug:
 var: db_is_in_recovery.stdout

The runtime output:

TASK [Perform database recovery check] *****************************************
changed: [10.127.3.187]

TASK [debug] *******************************************************************
ok: [10.127.3.187] => {
 "db_is_in_recovery.stdout": " receive_location | replay_location \n------------------+-----------------\n 0/3000060 | 0/3000060\n(1 row)"
}

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Using Event Actions to improve batch performance

All replication technologies typically suffer from one flaw, which is the way that the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use GoldenGate’s Event Marker Interface to employ Event Actions that:

• Detect the DML statement (INSERT INTO TABLE SELECT ..)
• Ignore the data resulting from the SELECT part of the statement
• Replicate just the DML statement as an Event Record
• Execute just the DML statement on the target database

The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues.
In the Extract configuration, the DML statement is passed in the replicated data stream by using the EVENTACTIONS parameter combined with IGNORE, TRANS, and INCLUDEEVENT. The source transaction is ignored, except for the DML statement that is written to the trail file:

TABLE PDB1.SRC.STATEMENT, &
 EVENTACTIONS (IGNORE TRANS INCLUDEEVENT);
 TABLE PDB1.SRC.BATCH;

On the target, the Replicat configuration uses a TABLE statement to pass the Event Record to a SQLEXEC statement that executes the DML statement by calling a stored procedure, as shown in the following code:

SOURCEDEFS ./dirdef/statement.def
 TABLE PDB1.SRC.STATEMENT, SQLEXEC(SPNAME execdml, ID sqlid, &
 PARAMS(P1 = sql_statement)), &
 EVENTACTIONS (IGNORE, LOG INFO);
 DISCARDFILE ./dirrpt/rolap01.dsc, PURGE, MEGABYTES 100
 ASSUMETARGETDEFS
 MAP PDB1.SRC.* TARGET PDB2.TGT.*;

Note the reference to a SOURCEDEFS file. This is necessary to explicitly declare the STATEMENT table definition to the Replicat. The procedure call is very simple because its only task is to execute the DML statement that is being passed by the Event Marker Interface.
This is shown in the following code:

CREATE OR REPLACE PROCEDURE ggadmin.execdml (P1 IN VARCHAR2)
 AS
 BEGIN
 execute immediate(P1);
 END;
 /

Now perform a test transaction by inserting the DML statement into the SQL_STATEMENT column of the SRC.STATEMENT table first. Then, execute the same DML statement on the source database, as shown in the following code:

SQL> conn SRC@PDB1
 Enter password:
 Connected.
 
SQL> INSERT INTO STATEMENT (SQL_STATEMENT) VALUES ('INSERT INTO TGT.BATCH SELECT * FROM ALL_OBJECTS');

1 row created.

SQL> INSERT INTO BATCH SELECT * FROM ALL_OBJECTS;

24372 rows created.

SQL> commit;

Commit complete.

SQL> select count(*) from batch;

COUNT(*)
 ----------
 24372

The Extract report shows the runtime statistics. The report file can be generated using the GGSCI SEND and VIEW REPORT commands as follows:

GGSCI (db12server01) 2> send extract EOLTP01, report

Sending REPORT request to EXTRACT EOLTP01 ...
 Request processed.

GGSCI (db12server01) 3> view report EOLTP01
 ..
 *****************************************************************
 * ** Run Time Statistics ** *
 *****************************************************************

Report at 2015-06-07 13:20:44 (activity since 2015-06-07 11:56:04)

Output to ./dirdat/lt:

From Table PDB1.SRC.STATEMENT:
 # inserts: 1
 # updates: 0
 # deletes: 0
 # discards: 0
 From Table PDB1.SRC.BATCH:
 # inserts: 0
 # updates: 0
 # deletes: 0
 # discards: 0
 # ignores: 24372

As expected, the 24372 records inserted into the BATCH table have been ignored by GoldenGate; only the DML statement is replicated to the STATEMENT table in the TGT schema on the target database. This is shown in the runtime information from the Replicat report as a successful Event Action and DML execution.

Note that the statistics show only one transaction and the call to the stored procedure:

 *****************************************************************
 * ** Run Time Statistics ** *
 *****************************************************************
 The last record for the last committed transaction is as follows:
 ___________________________________________________________________
 Trail name : ./dirdat/rt000083
 Hdr-Ind : E (x45) Partition : . (x04)
 UndoFlag : . (x00) BeforeAfter: A (x41)
 RecLength : 90 (x005a) IO Time : 2015-06-07 13:10:29.000465
 IOType : 5 (x05) OrigNode : 255 (xff)
 TransInd : . (x03) FormatType : R (x52)
 SyskeyLen : 0 (x00) Incomplete : . (x00)
 AuditRBA : 180 AuditPos : 6877200
 Continued : N (x00) RecCount : 1 (x01)

2015-06-07 13:10:29.000465 Insert Len 90 RBA 2009
 Name: PDB1.SRC.STATEMENT
 ___________________________________________________________________

Reading ./dirdat/rt000083, current RBA 2226, 1 records

Report at 2015-06-07 13:17:01 (activity since 2015-06-07 13:10:42)

From Table PDB1.SRC.STATEMENT:
 # inserts: 0
 # updates: 0
 # deletes: 0
 # discards: 0
 # ignores: 1

Stored procedure sqlid:
 attempts: 1
 successful: 1

From Table PDB1.SRC.STATEMENT to PDB2.TGT.STATEMENT:
 # inserts: 1
 # updates: 0
 # deletes: 0
 # discards: 0

The described method represents a massive performance boost to replicate bulk operations.

Previously, using Event Actions to improve batch performance was supported in classic delivery mode only. I have discovered that adding an explicit COMMIT statement to the stored procedure on the target database enables the mechanism to function correctly with the Integrated Replicat.

CREATE OR REPLACE PROCEDURE ggadmin.execdml (P1 IN VARCHAR2)
 AS
 BEGIN
 execute immediate(P1);
 commit; -- Explicit commit
 END;
 /

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

OGG and ODI – the perfect match

Oracle GoldenGate can be used for trickle feeding data warehouses, ideally when coupled with Oracle Data Integrator (ODI). GoldenGate complements ODI by providing log-based data capture capabilities combined with real-time data capture from source systems without any performance impact. Unlike traditional data warehouse Extract, Transform and Load (ETL) products, ODI enables Extract, Load and Transform (E-LT) architecture that further improves performance and reduces cost by eliminating the middle-tier transformation layer. ODI uses JDBC Connectivity to access different databases.

Oracle GoldenGate 12c is even more tightly integrated with Oracle Data Integrator 12c. For example, GoldenGate source and target systems are configured as data servers in ODI’s Topology. The Oracle GoldenGate Monitor Agent (formerly JAgent) includes a Weblogic application server that enables GoldenGate instances to be stopped and started from ODI, plus the ability to deploy GoldenGate parameter files to configuration targets, thus providing a complete end-to-end data integration solution.

 ETL verses E-LT

The diagram below helps to illustrate the key difference between ETL and E-LT architectures. Although seen here as an extra stage in the data delivery, E-LT can offer supreme benefits over traditional ETL.

ETL vs ELT

 

Now consider the pros and cons..

ELT

Oracle Data Integrator’s Extract, Load and Transform (E-LT) architecture leverages disparate Relational Database Management System (RDBMS) engines to process and transform the data. This approach optimizes performance and scalability and lowers overall solution costs.

Instead of relying on a separate, conventional ETL transformation server, ODI’s E-LT architecture generates native code for each RDBMS engine, for example, SQL and bulk loader scripts. E-LT architecture extracts data from sources, loads it into a target, and transforms it using the optimizations of the database (Set based SQL).

Pipelined functions bridge the gap between the PL/SQL complex transformations and set based SQL, but they also have some unique performance features of their own, making them a superb performance optimization tool.

For example,

  • replacing row-based inserts with pipelined function-based loads
  • utilize array fetches with BULK COLLECT
  • enable parallel pipelined function execution

ETL

ETL processes are often complex and inflexible. For example adding a source table or even a column to an existing source table can spawn a number of configuration and code changes, forcing down-time. This includes:

  • DDL
  • ETL Metadata
  • Transformations

Other issues are often resource related causing over-running batch jobs and contention with other online and batch processes. Plus, additional storage requirements are necessary to stage the replicated data before it can be processed by the ETL tool. However, once processed, the staging tables can be truncated, which is also true for E-LT.

Knowledge Modules

The modular architecture of Oracle GoldenGate and Data Integrator enables hot pluggable Knowledge Modules (KMs) that allow new data warehouse feeds to be added “on the fly”, preventing any planned downtime.

KMs are code templates that implement the actual data flow. Each KM is dedicated to an individual task in the overall data integration process. In Oracle Warehouse Builder style, the code that is generated and executed is derived from the declarative rules and metadata defined in the Oracle Data Integrator Designer module, and is fully supported by Oracle.

Oracle Data Integrator Knowledge Modules exist in different flavors. You need to choose the relevant KM for your source database. Out-of-the-box ODI ships with more than 100 KMs to support different vendor’s system interfaces. The main categories are:

  • The Reverse Engineer module takes care of the source data metadata. Here ODI retrieves the source database schema(s) table structures, transforming the extracted metadata and populates the ODI Work Repository.
  • The Journalize module is where Oracle GoldenGate Change Data Capture (CDC) takes place, journalizing the infrastructure and enabling CDC by reading the source database redologs and pumping the data to the staging database server via GoldenGate trail files. This is handled by GoldenGate Extract process(es). The trail files hold the committed transactions that are read by the GoldenGate Replicat process(es), then converted to DML/DDL and executed against the staging database.
  • The Load module uses GoldenGate to deliver the data to the staging tables from the remote database server. It can perform data transformation either at row-level on the source (capture) or target (delivery), or set based using the RDBMS engine on the target.
  • The Check module verifies the data in relation to the target table constraints. Any invalid data is written to the error tables.
  • The Integrate module loads the final, transformed data into the target tables.
  • The Service module creates and deploys data manipulation Web Services to your Service Oriented Architecture (SOA) infrastructure. It generates Java code that is compiled and deployed to the application server for execution, thus publishing data content to Web based services.

The following diagram illustrates the KM architecture and the process flow.

ODI process flow

One key benefit of the Knowledge Modules is they are dynamically reusable, you make one change and it is instantly propagated to hundreds of transformations, saving hours of manual complex configuration. How cool is that?

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Using User Tokens to populate a heartbeat table

A vast array of user tokens exist in GoldenGate. Let’s start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data.

The following Extract configuration code shows the token declarations for the heartbeat table:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
EXTGROUP = @GETENV (“GGENVIRONMENT”,”GROUPNAME”), &
EXTTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV (“JULIANTIMESTAMP”)), &
EXTLAG = @GETENV (“LAG”,”SEC”), &
EXTSTAT_TOTAL = @GETENV (“DELTASTATS”,”DML”), &
), FILTER (@STREQ (EXTGROUP, @GETENV (“GGENVIRONMENT”,”GROUPNAME”)));

 

The EXTSTAT_TOTAL user token is particularly useful to determine the amount of DML load that is taking place against the source database with reference to the EXTLAG user token within the same replicated data stream.

For the data pump, the example Extract configuration is as follows:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
PMPGROUP = @GETENV (“GGENVIRONMENT”,”GROUPNAME”), &
PMPTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV(“JULIANTIMESTAMP”)), &
PMPLAG = @GETENV (“LAG”,"SEC”));

Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag:

MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &
KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &
INSERTMISSINGUPDATES, &
COLMAP (USEDEFAULTS, &
ID = 0, &
SOURCE_COMMIT = @GETENV (“GGHEADER”, “COMMITTIMESTAMP”), &
EXTGROUP = @TOKEN (“EXTGROUP”), &
EXTTIME = @TOKEN (“EXTTIME”), &
PMPGROUP = @TOKEN (“PMPGROUP”), &
PMPTIME = @TOKEN (“PMPTIME”), &
REPGROUP = @TOKEN (“REPGROUP”), &
REPTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV (“JULIANTIMESTAMP”)), &
EXTLAG = @TOKEN (“EXTLAG”), &
PMPLAG = @TOKEN (“PMPLAG”), &
REPLAG = @GETENV (“LAG”,”SEC”), &
EXTSTAT_TOTAL = @TOKEN (“EXTSTAT_TOTAL”));

 

As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained.

As mentioned previously, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. If the user token data is copied to an audit table, normally via a trigger on the target database’s heartbeat table, the statistics can be graphed over time as illustrated in the example below.

OGG ReplicationKey

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Oracle GoldenGate Activity Logging

Since the publication of my latest book Oracle GoldenGate 12c Implementer’s Guide, I have discovered a great feature that has existed in OGG from version 11.1.1.0.

Activity Logging provides a verbose tracing facility that enables dynamic activity logging of any GoldenGate process. The feature is very useful for obtaining additional debug information and is enabled by placing an XML initialisation file in the OGG Home having the following filename naming convention and content.

gglog-<process>.dmp

<?xml version="1.0" ?>
- <configuration reset="true">
- <appender name="traceini" class="RollingFileAppender">
           <param name="BufferedIO" value="false" />
           <param name="Append" value="true" />
           <param name="File" value="traceLog_%I_%A" />
           <param name="MaxBackupIndex" value="99" />
           <param name="MaxFileSize" value="10MB" />
- <layout class="PatternLayout">
           <param name="Pattern" value="%d{%m/%d %H:%M:%S} [%C{1}:%L] %m%n" />
  </layout>
  </appender>
- <logger name="gglog.std">
     <level value="all" />
     <appender-ref name="traceini" />
  </logger>
- <logger name="gglog.std.utility">
     <level value="off" />
     <appender-ref name="traceini" />
 </logger>
</configuration>

 

<process> can be replaced with either of the following options, depending on the extent of your tracing requirements:

gglog-EOLTP.xml            — to trace the extract process named EOLTP
gglog-extract.xml           — to trace all extract processes
gglog-ROLAP.xml           — to trace the replicat process named ROLAP
gglog-replicat.xml          — to trace all replicat processes
gglog-mgr.xml                — to trace the MGR process
gglog-server.xml            — to trace the server collector process

As soon as any of the above files are copied to the OGG Home directory and the OGG process is running, tracing will immediately start and produce a traceLog file containing logdump style output.

Below is an example traceLog for the Replicat ROLAP process:

 

09/16 02:18:45.392 [processloop.cpp:204] process_replicat_loop:
 main loop ----------------------------------------------*
 09/16 02:18:45.392 [etparse.c:1007] token header : hdr_ind : 000000: 47 |G |
 09/16 02:18:45.392 [etparse.c:1008] token header : info : 000000: 01 |. |
 09/16 02:18:45.392 [etparse.c:1009] token header : length : 235
 09/16 02:18:45.392 [etparse.c:1015] token trailer : hdr_ind : 000000: 5a |Z |
 09/16 02:18:45.392 [etparse.c:1016] token trailer : info : 000000: 01 |. |
 09/16 02:18:45.392 [etparse.c:1017] token trailer : length : 60160
 09/16 02:18:45.392 [api.c:4373] et_result: 0
 09/16 02:18:45.392 [api.c:4374] ET rec ggs tkn:
 000000: 52 00 14 00 41 41 44 7a 72 45 41 41 45 41 41 41 |R...AADzrEAAEAAA|

Continue reading Oracle GoldenGate Activity Logging