OGG and ODI – the perfect match

Oracle GoldenGate can be used for trickle feeding data warehouses, ideally when coupled with Oracle Data Integrator (ODI). GoldenGate complements ODI by providing log-based data capture capabilities combined with real-time data capture from source systems without any performance impact. Unlike traditional data warehouse Extract, Transform and Load (ETL) products, ODI enables Extract, Load and Transform (E-LT) architecture that further improves performance and reduces cost by eliminating the middle-tier transformation layer. ODI uses JDBC Connectivity to access different databases.

Oracle GoldenGate 12c is even more tightly integrated with Oracle Data Integrator 12c. For example, GoldenGate source and target systems are configured as data servers in ODI’s Topology. The Oracle GoldenGate Monitor Agent (formerly JAgent) includes a Weblogic application server that enables GoldenGate instances to be stopped and started from ODI, plus the ability to deploy GoldenGate parameter files to configuration targets, thus providing a complete end-to-end data integration solution.

 ETL verses E-LT

The diagram below helps to illustrate the key difference between ETL and E-LT architectures. Although seen here as an extra stage in the data delivery, E-LT can offer supreme benefits over traditional ETL.

ETL vs ELT

 

Now consider the pros and cons..

ELT

Oracle Data Integrator’s Extract, Load and Transform (E-LT) architecture leverages disparate Relational Database Management System (RDBMS) engines to process and transform the data. This approach optimizes performance and scalability and lowers overall solution costs.

Instead of relying on a separate, conventional ETL transformation server, ODI’s E-LT architecture generates native code for each RDBMS engine, for example, SQL and bulk loader scripts. E-LT architecture extracts data from sources, loads it into a target, and transforms it using the optimizations of the database (Set based SQL).

Pipelined functions bridge the gap between the PL/SQL complex transformations and set based SQL, but they also have some unique performance features of their own, making them a superb performance optimization tool.

For example,

  • replacing row-based inserts with pipelined function-based loads
  • utilize array fetches with BULK COLLECT
  • enable parallel pipelined function execution

ETL

ETL processes are often complex and inflexible. For example adding a source table or even a column to an existing source table can spawn a number of configuration and code changes, forcing down-time. This includes:

  • DDL
  • ETL Metadata
  • Transformations

Other issues are often resource related causing over-running batch jobs and contention with other online and batch processes. Plus, additional storage requirements are necessary to stage the replicated data before it can be processed by the ETL tool. However, once processed, the staging tables can be truncated, which is also true for E-LT.

Knowledge Modules

The modular architecture of Oracle GoldenGate and Data Integrator enables hot pluggable Knowledge Modules (KMs) that allow new data warehouse feeds to be added “on the fly”, preventing any planned downtime.

KMs are code templates that implement the actual data flow. Each KM is dedicated to an individual task in the overall data integration process. In Oracle Warehouse Builder style, the code that is generated and executed is derived from the declarative rules and metadata defined in the Oracle Data Integrator Designer module, and is fully supported by Oracle.

Oracle Data Integrator Knowledge Modules exist in different flavors. You need to choose the relevant KM for your source database. Out-of-the-box ODI ships with more than 100 KMs to support different vendor’s system interfaces. The main categories are:

  • The Reverse Engineer module takes care of the source data metadata. Here ODI retrieves the source database schema(s) table structures, transforming the extracted metadata and populates the ODI Work Repository.
  • The Journalize module is where Oracle GoldenGate Change Data Capture (CDC) takes place, journalizing the infrastructure and enabling CDC by reading the source database redologs and pumping the data to the staging database server via GoldenGate trail files. This is handled by GoldenGate Extract process(es). The trail files hold the committed transactions that are read by the GoldenGate Replicat process(es), then converted to DML/DDL and executed against the staging database.
  • The Load module uses GoldenGate to deliver the data to the staging tables from the remote database server. It can perform data transformation either at row-level on the source (capture) or target (delivery), or set based using the RDBMS engine on the target.
  • The Check module verifies the data in relation to the target table constraints. Any invalid data is written to the error tables.
  • The Integrate module loads the final, transformed data into the target tables.
  • The Service module creates and deploys data manipulation Web Services to your Service Oriented Architecture (SOA) infrastructure. It generates Java code that is compiled and deployed to the application server for execution, thus publishing data content to Web based services.

The following diagram illustrates the KM architecture and the process flow.

ODI process flow

One key benefit of the Knowledge Modules is they are dynamically reusable, you make one change and it is instantly propagated to hundreds of transformations, saving hours of manual complex configuration. How cool is that?

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Using User Tokens to populate a heartbeat table

A vast array of user tokens exist in GoldenGate. Let’s start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data.

The following Extract configuration code shows the token declarations for the heartbeat table:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
EXTGROUP = @GETENV (“GGENVIRONMENT”,”GROUPNAME”), &
EXTTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV (“JULIANTIMESTAMP”)), &
EXTLAG = @GETENV (“LAG”,”SEC”), &
EXTSTAT_TOTAL = @GETENV (“DELTASTATS”,”DML”), &
), FILTER (@STREQ (EXTGROUP, @GETENV (“GGENVIRONMENT”,”GROUPNAME”)));

 

The EXTSTAT_TOTAL user token is particularly useful to determine the amount of DML load that is taking place against the source database with reference to the EXTLAG user token within the same replicated data stream.

For the data pump, the example Extract configuration is as follows:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
PMPGROUP = @GETENV (“GGENVIRONMENT”,”GROUPNAME”), &
PMPTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV(“JULIANTIMESTAMP”)), &
PMPLAG = @GETENV (“LAG”,"SEC”));

Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag:

MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &
KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &
INSERTMISSINGUPDATES, &
COLMAP (USEDEFAULTS, &
ID = 0, &
SOURCE_COMMIT = @GETENV (“GGHEADER”, “COMMITTIMESTAMP”), &
EXTGROUP = @TOKEN (“EXTGROUP”), &
EXTTIME = @TOKEN (“EXTTIME”), &
PMPGROUP = @TOKEN (“PMPGROUP”), &
PMPTIME = @TOKEN (“PMPTIME”), &
REPGROUP = @TOKEN (“REPGROUP”), &
REPTIME = @DATE (“YYYY-MM-DD HH:MI:SS.FFFFFF”,”JTS”,@GETENV (“JULIANTIMESTAMP”)), &
EXTLAG = @TOKEN (“EXTLAG”), &
PMPLAG = @TOKEN (“PMPLAG”), &
REPLAG = @GETENV (“LAG”,”SEC”), &
EXTSTAT_TOTAL = @TOKEN (“EXTSTAT_TOTAL”));

 

As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained.

As mentioned previously, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. If the user token data is copied to an audit table, normally via a trigger on the target database’s heartbeat table, the statistics can be graphed over time as illustrated in the example below.

OGG ReplicationKey

[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]

Oracle GoldenGate Activity Logging

Since the publication of my latest book Oracle GoldenGate 12c Implementer’s Guide, I have discovered a great feature that has existed in OGG from version 11.1.1.0.

Activity Logging provides a verbose tracing facility that enables dynamic activity logging of any GoldenGate process. The feature is very useful for obtaining additional debug information and is enabled by placing an XML initialisation file in the OGG Home having the following filename naming convention and content.

gglog-<process>.dmp

<?xml version="1.0" ?>
- <configuration reset="true">
- <appender name="traceini" class="RollingFileAppender">
           <param name="BufferedIO" value="false" />
           <param name="Append" value="true" />
           <param name="File" value="traceLog_%I_%A" />
           <param name="MaxBackupIndex" value="99" />
           <param name="MaxFileSize" value="10MB" />
- <layout class="PatternLayout">
           <param name="Pattern" value="%d{%m/%d %H:%M:%S} [%C{1}:%L] %m%n" />
  </layout>
  </appender>
- <logger name="gglog.std">
     <level value="all" />
     <appender-ref name="traceini" />
  </logger>
- <logger name="gglog.std.utility">
     <level value="off" />
     <appender-ref name="traceini" />
 </logger>
</configuration>

 

<process> can be replaced with either of the following options, depending on the extent of your tracing requirements:

gglog-EOLTP.xml            — to trace the extract process named EOLTP
gglog-extract.xml           — to trace all extract processes
gglog-ROLAP.xml           — to trace the replicat process named ROLAP
gglog-replicat.xml          — to trace all replicat processes
gglog-mgr.xml                — to trace the MGR process
gglog-server.xml            — to trace the server collector process

As soon as any of the above files are copied to the OGG Home directory and the OGG process is running, tracing will immediately start and produce a traceLog file containing logdump style output.

Below is an example traceLog for the Replicat ROLAP process:

 

09/16 02:18:45.392 [processloop.cpp:204] process_replicat_loop:
 main loop ----------------------------------------------*
 09/16 02:18:45.392 [etparse.c:1007] token header : hdr_ind : 000000: 47 |G |
 09/16 02:18:45.392 [etparse.c:1008] token header : info : 000000: 01 |. |
 09/16 02:18:45.392 [etparse.c:1009] token header : length : 235
 09/16 02:18:45.392 [etparse.c:1015] token trailer : hdr_ind : 000000: 5a |Z |
 09/16 02:18:45.392 [etparse.c:1016] token trailer : info : 000000: 01 |. |
 09/16 02:18:45.392 [etparse.c:1017] token trailer : length : 60160
 09/16 02:18:45.392 [api.c:4373] et_result: 0
 09/16 02:18:45.392 [api.c:4374] ET rec ggs tkn:
 000000: 52 00 14 00 41 41 44 7a 72 45 41 41 45 41 41 41 |R...AADzrEAAEAAA|

Continue reading Oracle GoldenGate Activity Logging