Oracle GoldenGate can be used for trickle feeding data warehouses, ideally when coupled with Oracle Data Integrator (ODI). GoldenGate complements ODI by providing log-based data capture capabilities combined with real-time data capture from source systems without any performance impact. Unlike traditional data warehouse Extract, Transform and Load (ETL) products, ODI enables Extract, Load and Transform (E-LT) architecture that further improves performance and reduces cost by eliminating the middle-tier transformation layer. ODI uses JDBC Connectivity to access different databases.
Oracle GoldenGate 12c is even more tightly integrated with Oracle Data Integrator 12c. For example, GoldenGate source and target systems are configured as data servers in ODI’s Topology. The Oracle GoldenGate Monitor Agent (formerly JAgent) includes a Weblogic application server that enables GoldenGate instances to be stopped and started from ODI, plus the ability to deploy GoldenGate parameter files to configuration targets, thus providing a complete end-to-end data integration solution.
ETL verses E-LT
The diagram below helps to illustrate the key difference between ETL and E-LT architectures. Although seen here as an extra stage in the data delivery, E-LT can offer supreme benefits over traditional ETL.
Now consider the pros and cons..
Oracle Data Integrator’s Extract, Load and Transform (E-LT) architecture leverages disparate Relational Database Management System (RDBMS) engines to process and transform the data. This approach optimizes performance and scalability and lowers overall solution costs.
Instead of relying on a separate, conventional ETL transformation server, ODI’s E-LT architecture generates native code for each RDBMS engine, for example, SQL and bulk loader scripts. E-LT architecture extracts data from sources, loads it into a target, and transforms it using the optimizations of the database (Set based SQL).
Pipelined functions bridge the gap between the PL/SQL complex transformations and set based SQL, but they also have some unique performance features of their own, making them a superb performance optimization tool.
- replacing row-based inserts with pipelined function-based loads
- utilize array fetches with BULK COLLECT
- enable parallel pipelined function execution
ETL processes are often complex and inflexible. For example adding a source table or even a column to an existing source table can spawn a number of configuration and code changes, forcing down-time. This includes:
- ETL Metadata
Other issues are often resource related causing over-running batch jobs and contention with other online and batch processes. Plus, additional storage requirements are necessary to stage the replicated data before it can be processed by the ETL tool. However, once processed, the staging tables can be truncated, which is also true for E-LT.
The modular architecture of Oracle GoldenGate and Data Integrator enables hot pluggable Knowledge Modules (KMs) that allow new data warehouse feeds to be added “on the fly”, preventing any planned downtime.
KMs are code templates that implement the actual data flow. Each KM is dedicated to an individual task in the overall data integration process. In Oracle Warehouse Builder style, the code that is generated and executed is derived from the declarative rules and metadata defined in the Oracle Data Integrator Designer module, and is fully supported by Oracle.
Oracle Data Integrator Knowledge Modules exist in different flavors. You need to choose the relevant KM for your source database. Out-of-the-box ODI ships with more than 100 KMs to support different vendor’s system interfaces. The main categories are:
- The Reverse Engineer module takes care of the source data metadata. Here ODI retrieves the source database schema(s) table structures, transforming the extracted metadata and populates the ODI Work Repository.
- The Journalize module is where Oracle GoldenGate Change Data Capture (CDC) takes place, journalizing the infrastructure and enabling CDC by reading the source database redologs and pumping the data to the staging database server via GoldenGate trail files. This is handled by GoldenGate Extract process(es). The trail files hold the committed transactions that are read by the GoldenGate Replicat process(es), then converted to DML/DDL and executed against the staging database.
- The Load module uses GoldenGate to deliver the data to the staging tables from the remote database server. It can perform data transformation either at row-level on the source (capture) or target (delivery), or set based using the RDBMS engine on the target.
- The Check module verifies the data in relation to the target table constraints. Any invalid data is written to the error tables.
- The Integrate module loads the final, transformed data into the target tables.
- The Service module creates and deploys data manipulation Web Services to your Service Oriented Architecture (SOA) infrastructure. It generates Java code that is compiled and deployed to the application server for execution, thus publishing data content to Web based services.
The following diagram illustrates the KM architecture and the process flow.
One key benefit of the Knowledge Modules is they are dynamically reusable, you make one change and it is instantly propagated to hundreds of transformations, saving hours of manual complex configuration. How cool is that?
[contact-form][contact-field label=’Name’ type=’name’ required=’1’/][contact-field label=’Email’ type=’email’ required=’1’/][contact-field label=’Website’ type=’url’/][contact-field label=’Comment’ type=’textarea’ required=’1’/][/contact-form]