Category Archives: OGG

Using Event Actions to improve batch performance

All replication technologies typically suffer from one flaw, which is the way that the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use GoldenGate’s Event Marker Interface to employ Event Actions that:

• Detect the DML statement (INSERT INTO TABLE SELECT ..)
• Ignore the data resulting from the SELECT part of the statement
• Replicate just the DML statement as an Event Record
• Execute just the DML statement on the target database

The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues.
In the Extract configuration, the DML statement is passed in the replicated data stream by using the EVENTACTIONS parameter combined with IGNORE, TRANS, and INCLUDEEVENT. The source transaction is ignored, except for the DML statement that is written to the trail file:

TABLE PDB1.SRC.STATEMENT, &
 EVENTACTIONS (IGNORE TRANS INCLUDEEVENT);
 TABLE PDB1.SRC.BATCH;

On the target, the Replicat configuration uses a TABLE statement to pass the Event Record to a SQLEXEC statement that executes the DML statement by calling a stored procedure, as shown in the following code:

SOURCEDEFS ./dirdef/statement.def
 TABLE PDB1.SRC.STATEMENT, SQLEXEC(SPNAME execdml, ID sqlid, &
 PARAMS(P1 = sql_statement)), &
 EVENTACTIONS (IGNORE, LOG INFO);
 DISCARDFILE ./dirrpt/rolap01.dsc, PURGE, MEGABYTES 100
 ASSUMETARGETDEFS
 MAP PDB1.SRC.* TARGET PDB2.TGT.*;

Note the reference to a SOURCEDEFS file. This is necessary to explicitly declare the STATEMENT table definition to the Replicat. The procedure call is very simple because its only task is to execute the DML statement that is being passed by the Event Marker Interface.
This is shown in the following code:

CREATE OR REPLACE PROCEDURE ggadmin.execdml (P1 IN VARCHAR2)
 AS
 BEGIN
 execute immediate(P1);
 END;
 /

Now perform a test transaction by inserting the DML statement into the SQL_STATEMENT column of the SRC.STATEMENT table first. Then, execute the same DML statement on the source database, as shown in the following code:

SQL> conn SRC@PDB1
 Enter password:
 Connected.
 
SQL> INSERT INTO STATEMENT (SQL_STATEMENT) VALUES ('INSERT INTO TGT.BATCH SELECT * FROM ALL_OBJECTS');

1 row created.

SQL> INSERT INTO BATCH SELECT * FROM ALL_OBJECTS;

24372 rows created.

SQL> commit;

Commit complete.

SQL> select count(*) from batch;

COUNT(*)
 ----------
 24372

The Extract report shows the runtime statistics. The report file can be generated using the GGSCI SEND and VIEW REPORT commands as follows:

GGSCI (db12server01) 2> send extract EOLTP01, report

Sending REPORT request to EXTRACT EOLTP01 ...
 Request processed.

GGSCI (db12server01) 3> view report EOLTP01
 ..
 *****************************************************************
 * ** Run Time Statistics ** *
 *****************************************************************

Report at 2015-06-07 13:20:44 (activity since 2015-06-07 11:56:04)

Output to ./dirdat/lt:

From Table PDB1.SRC.STATEMENT:
 # inserts: 1
 # updates: 0
 # deletes: 0
 # discards: 0
 From Table PDB1.SRC.BATCH:
 # inserts: 0
 # updates: 0
 # deletes: 0
 # discards: 0
 # ignores: 24372

As expected, the 24372 records inserted into the BATCH table have been ignored by GoldenGate; only the DML statement is replicated to the STATEMENT table in the TGT schema on the target database. This is shown in the runtime information from the Replicat report as a successful Event Action and DML execution.

Note that the statistics show only one transaction and the call to the stored procedure:

 *****************************************************************
 * ** Run Time Statistics ** *
 *****************************************************************
 The last record for the last committed transaction is as follows:
 ___________________________________________________________________
 Trail name : ./dirdat/rt000083
 Hdr-Ind : E (x45) Partition : . (x04)
 UndoFlag : . (x00) BeforeAfter: A (x41)
 RecLength : 90 (x005a) IO Time : 2015-06-07 13:10:29.000465
 IOType : 5 (x05) OrigNode : 255 (xff)
 TransInd : . (x03) FormatType : R (x52)
 SyskeyLen : 0 (x00) Incomplete : . (x00)
 AuditRBA : 180 AuditPos : 6877200
 Continued : N (x00) RecCount : 1 (x01)

2015-06-07 13:10:29.000465 Insert Len 90 RBA 2009
 Name: PDB1.SRC.STATEMENT
 ___________________________________________________________________

Reading ./dirdat/rt000083, current RBA 2226, 1 records

Report at 2015-06-07 13:17:01 (activity since 2015-06-07 13:10:42)

From Table PDB1.SRC.STATEMENT:
 # inserts: 0
 # updates: 0
 # deletes: 0
 # discards: 0
 # ignores: 1

Stored procedure sqlid:
 attempts: 1
 successful: 1

From Table PDB1.SRC.STATEMENT to PDB2.TGT.STATEMENT:
 # inserts: 1
 # updates: 0
 # deletes: 0
 # discards: 0

The described method represents a massive performance boost to replicate bulk operations.

Previously, using Event Actions to improve batch performance was supported in classic delivery mode only. I have discovered that adding an explicit COMMIT statement to the stored procedure on the target database enables the mechanism to function correctly with the Integrated Replicat.

CREATE OR REPLACE PROCEDURE ggadmin.execdml (P1 IN VARCHAR2)
 AS
 BEGIN
 execute immediate(P1);
 commit; -- Explicit commit
 END;
 /

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

OGG and ODI – the perfect match

Oracle GoldenGate can be used for trickle feeding data warehouses, ideally when coupled with Oracle Data Integrator (ODI). GoldenGate complements ODI by providing log-based data capture capabilities combined with real-time data capture from source systems without any performance impact. Unlike traditional data warehouse Extract, Transform and Load (ETL) products, ODI enables Extract, Load and Transform (E-LT) architecture that further improves performance and reduces cost by eliminating the middle-tier transformation layer. ODI uses JDBC Connectivity to access different databases.

Oracle GoldenGate 12c is even more tightly integrated with Oracle Data Integrator 12c. For example, GoldenGate source and target systems are configured as data servers in ODI’s Topology. The Oracle GoldenGate Monitor Agent (formerly JAgent) includes a Weblogic application server that enables GoldenGate instances to be stopped and started from ODI, plus the ability to deploy GoldenGate parameter files to configuration targets, thus providing a complete end-to-end data integration solution.

 ETL verses E-LT

The diagram below helps to illustrate the key difference between ETL and E-LT architectures. Although seen here as an extra stage in the data delivery, E-LT can offer supreme benefits over traditional ETL.

ETL vs ELT

 

Now consider the pros and cons..

ELT

Oracle Data Integrator’s Extract, Load and Transform (E-LT) architecture leverages disparate Relational Database Management System (RDBMS) engines to process and transform the data. This approach optimizes performance and scalability and lowers overall solution costs.

Instead of relying on a separate, conventional ETL transformation server, ODI’s E-LT architecture generates native code for each RDBMS engine, for example, SQL and bulk loader scripts. E-LT architecture extracts data from sources, loads it into a target, and transforms it using the optimizations of the database (Set based SQL).

Pipelined functions bridge the gap between the PL/SQL complex transformations and set based SQL, but they also have some unique performance features of their own, making them a superb performance optimization tool.

For example,

  • replacing row-based inserts with pipelined function-based loads
  • utilize array fetches with BULK COLLECT
  • enable parallel pipelined function execution

ETL

ETL processes are often complex and inflexible. For example adding a source table or even a column to an existing source table can spawn a number of configuration and code changes, forcing down-time. This includes:

  • DDL
  • ETL Metadata
  • Transformations

Other issues are often resource related causing over-running batch jobs and contention with other online and batch processes. Plus, additional storage requirements are necessary to stage the replicated data before it can be processed by the ETL tool. However, once processed, the staging tables can be truncated, which is also true for E-LT.

Knowledge Modules

The modular architecture of Oracle GoldenGate and Data Integrator enables hot pluggable Knowledge Modules (KMs) that allow new data warehouse feeds to be added “on the fly”, preventing any planned downtime.

KMs are code templates that implement the actual data flow. Each KM is dedicated to an individual task in the overall data integration process. In Oracle Warehouse Builder style, the code that is generated and executed is derived from the declarative rules and metadata defined in the Oracle Data Integrator Designer module, and is fully supported by Oracle.

Oracle Data Integrator Knowledge Modules exist in different flavors. You need to choose the relevant KM for your source database. Out-of-the-box ODI ships with more than 100 KMs to support different vendor’s system interfaces. The main categories are:

  • The Reverse Engineer module takes care of the source data metadata. Here ODI retrieves the source database schema(s) table structures, transforming the extracted metadata and populates the ODI Work Repository.
  • The Journalize module is where Oracle GoldenGate Change Data Capture (CDC) takes place, journalizing the infrastructure and enabling CDC by reading the source database redologs and pumping the data to the staging database server via GoldenGate trail files. This is handled by GoldenGate Extract process(es). The trail files hold the committed transactions that are read by the GoldenGate Replicat process(es), then converted to DML/DDL and executed against the staging database.
  • The Load module uses GoldenGate to deliver the data to the staging tables from the remote database server. It can perform data transformation either at row-level on the source (capture) or target (delivery), or set based using the RDBMS engine on the target.
  • The Check module verifies the data in relation to the target table constraints. Any invalid data is written to the error tables.
  • The Integrate module loads the final, transformed data into the target tables.
  • The Service module creates and deploys data manipulation Web Services to your Service Oriented Architecture (SOA) infrastructure. It generates Java code that is compiled and deployed to the application server for execution, thus publishing data content to Web based services.

The following diagram illustrates the KM architecture and the process flow.

ODI process flow

One key benefit of the Knowledge Modules is they are dynamically reusable, you make one change and it is instantly propagated to hundreds of transformations, saving hours of manual complex configuration. How cool is that?

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Using User Tokens to populate a heartbeat table

A vast array of user tokens exist in GoldenGate. Let’s start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data.

The following Extract configuration code shows the token declarations for the heartbeat table:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
EXTGROUP = @GETENV (“GGENVIRONMENT”,”GROUPNAME”), &
EXTTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV (“JULIANTIMESTAMP”)), &
EXTLAG = @GETENV (“LAG”,”SEC”), &
EXTSTAT_TOTAL = @GETENV (“DELTASTATS”,”DML”), &
), FILTER (@STREQ (EXTGROUP, @GETENV (“GGENVIRONMENT”,”GROUPNAME”)));

 

The EXTSTAT_TOTAL user token is particularly useful to determine the amount of DML load that is taking place against the source database with reference to the EXTLAG user token within the same replicated data stream.

For the data pump, the example Extract configuration is as follows:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
PMPGROUP = @GETENV (“GGENVIRONMENT”,”GROUPNAME”), &
PMPTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV(“JULIANTIMESTAMP”)), &
PMPLAG = @GETENV (“LAG”,"SEC”));

Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag:

MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &
KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &
INSERTMISSINGUPDATES, &
COLMAP (USEDEFAULTS, &
ID = 0, &
SOURCE_COMMIT = @GETENV (“GGHEADER”, “COMMITTIMESTAMP”), &
EXTGROUP = @TOKEN (“EXTGROUP”), &
EXTTIME = @TOKEN (“EXTTIME”), &
PMPGROUP = @TOKEN (“PMPGROUP”), &
PMPTIME = @TOKEN (“PMPTIME”), &
REPGROUP = @TOKEN (“REPGROUP”), &
REPTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV (“JULIANTIMESTAMP”)), &
EXTLAG = @TOKEN (“EXTLAG”), &
PMPLAG = @TOKEN (“PMPLAG”), &
REPLAG = @GETENV (“LAG”,”SEC”), &
EXTSTAT_TOTAL = @TOKEN (“EXTSTAT_TOTAL”));

 

As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained.

As mentioned previously, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. If the user token data is copied to an audit table, normally via a trigger on the target database’s heartbeat table, the statistics can be graphed over time as illustrated in the example below.

OGG ReplicationKey

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Oracle GoldenGate Activity Logging

Since the publication of my latest book Oracle GoldenGate 12c Implementer’s Guide, I have discovered a great feature that has existed in OGG from version 11.1.1.0.

Activity Logging provides a verbose tracing facility that enables dynamic activity logging of any GoldenGate process. The feature is very useful for obtaining additional debug information and is enabled by placing an XML initialisation file in the OGG Home having the following filename naming convention and content.

gglog-<process>.dmp

<?xml version="1.0" ?>
- <configuration reset="true">
- <appender name="traceini" class="RollingFileAppender">
           <param name="BufferedIO" value="false" />
           <param name="Append" value="true" />
           <param name="File" value="traceLog_%I_%A" />
           <param name="MaxBackupIndex" value="99" />
           <param name="MaxFileSize" value="10MB" />
- <layout class="PatternLayout">
           <param name="Pattern" value="%d{%m/%d %H:%M:%S} [%C{1}:%L] %m%n" />
  </layout>
  </appender>
- <logger name="gglog.std">
     <level value="all" />
     <appender-ref name="traceini" />
  </logger>
- <logger name="gglog.std.utility">
     <level value="off" />
     <appender-ref name="traceini" />
 </logger>
</configuration>

 

<process> can be replaced with either of the following options, depending on the extent of your tracing requirements:

gglog-EOLTP.xml            — to trace the extract process named EOLTP
gglog-extract.xml           — to trace all extract processes
gglog-ROLAP.xml           — to trace the replicat process named ROLAP
gglog-replicat.xml          — to trace all replicat processes
gglog-mgr.xml                — to trace the MGR process
gglog-server.xml            — to trace the server collector process

As soon as any of the above files are copied to the OGG Home directory and the OGG process is running, tracing will immediately start and produce a traceLog file containing logdump style output.

Below is an example traceLog for the Replicat ROLAP process:

 

09/16 02:18:45.392 [processloop.cpp:204] process_replicat_loop:
 main loop ----------------------------------------------*
 09/16 02:18:45.392 [etparse.c:1007] token header : hdr_ind : 000000: 47 |G |
 09/16 02:18:45.392 [etparse.c:1008] token header : info : 000000: 01 |. |
 09/16 02:18:45.392 [etparse.c:1009] token header : length : 235
 09/16 02:18:45.392 [etparse.c:1015] token trailer : hdr_ind : 000000: 5a |Z |
 09/16 02:18:45.392 [etparse.c:1016] token trailer : info : 000000: 01 |. |
 09/16 02:18:45.392 [etparse.c:1017] token trailer : length : 60160
 09/16 02:18:45.392 [api.c:4373] et_result: 0
 09/16 02:18:45.392 [api.c:4374] ET rec ggs tkn:
 000000: 52 00 14 00 41 41 44 7a 72 45 41 41 45 41 41 41 |R...AADzrEAAEAAA|

Continue reading Oracle GoldenGate Activity Logging