转载:So what is better, ETL or ELT?

So what is better, ETL or ELT?

Source: http://it.toolbox.com/blogs/infosphere/so-what-is-better-etl-or-elt-13572

Every now and then I come across a blog entry that reminds me there are people out there who know a lot more about my niche than I do! This is fortunate as this week it has helped me understand ELT tools.

ETL versus ELT and ETLT

The world of data integration has it's own Coke versus Pepsi challenge - it's called ETL versus ELT. Not as exciting as Aliens versus Predator, not as compelling as Ali versus Frasier and not as sexy as Erin Brockovich versus whatever company that was ... but an important battle if you are in charge of a multi million dollar IT budget.

ETL (Extract, Transform and Load) is the coca cola in the challenge with Informatica and DataStage the champions in terms of license fees and market share. It is made up of software that transforms and migrates data on most platforms with or without source and target databases. Business Objects, SAS, Microsoft SSIS, Ab Initio and Cognos are also in the ETL corner.

ELT (Extract, Load and Transform) is the challenger and is now largely driven by RDBMS vendor Oracle with Oracle Warehouse Builder and Sunopsis. It consists of software that transforms and migrates data in a database engine, often by generating SQL statements and procedures and moving data between tables.

ELT technology was constrained by what the database was capable and since many came from RDBMS vendors they tended to be suitable for just one database platform. Eg. Oracle Warehouse Builder and Microsoft DTS. They were also lacking functionality as the vendor was more concerned with building a database rather than an ELT tool. Sunopsis was an exception as an ELT tool not owned by an RDBMS vendor (until the Oracle acquired them ).

Informatica has recently moved into the the ETLT (Extract, Transform, Load and Transform) area with database pushdown optimization. This is standard ETL delivering to a target database and some extra sexy moves done moving it into more tables. Microsoft SSIS also has good ETLT capabilities within the SQL Server database.

Pros of each

I haven't had a lot of experience with ELT products but fortunately Dan Lindstedt from the B-Eye-Network blogs has been talking about this topic for years now and his recent entry ETL, ELT - Challenges and Metadata has a great comparison. Here are his pros of each list below, visit his blog for further discussion and the Cons of each tool:

Pros:
* ETL can balance the workload / share the workload with the RDBMS
* ETL can perform more complex operations in single data flow diagrams (data maps)
* ETL can scale with separate hardware.
* ETL can handle Partitioning and parallelism independent of the data model, database layout, and source data model architecture.
* ETL can process data in-stream, as it transfers from source to target
* ETL does not require co-location of data sets in order to do it's work.
* ETL captures huge amounts of metadata lineage today.
* ETL can run on SMP or MPP hardware

I would add to this data quality. The ETL tools have a head start over ELT in terms of data quality integration with Informatica and DataStage integrating closely. The row-by-row processing method of ETL works well with third party products such as data quality or business rule engines.

And the Pros of ELT.

Pros:
* ELT leverages RDBMS engine hardware for scalability
* ELT keeps all data in the RDBMS all the time
* ELT is parallelized according to the data set, and disk I/O is usually optimized at the engine level for faster throughput.
* ELT Scales as long as the hardware and RDBMS engine can continue to scale.
* ETL can achieve 3x to 4x the throughput rates on the appropriately tuned MPP RDBMS platform.

I'm not sure whether that final point refers to data that is being processed from one RDBMS table to another on the same server or whether it also applies to cross database migrations. If you are moving between database platforms ELT may need to transfer the data and save it before it commences transformation whereas ETL can transfer and transform in the one stream.

Another Pro of ELT is that once the database is on the target platform it no longer places stress on the network, all further transformation is done on the RDBMS server. These Pros of ELT is why ETL tools have an ELT capability, either the simple interface of DataStage in running user-defined SQL and procedures or the advanced capabilities of Informatica to generate SQL transformation code.

Informatica Pushdown ELT

In 2006 Informatica made an each way bet in the challenge by putting ELT style push down optimization into the PowerCenter ETL tool. You can see a very good list of 8.1.1 and Pushdown optimization best practices at the ETL/ELT forum.

Sunopsis ELT

You can see Sunopsis ELT in action at Mark Rittmans Sunopsis Data Conductor : Creating an Oracle Project and in Moving Global Electronics Data using Sunopsis he evaluates the Oracle SQL generated.

DataStage ELT

In DataStage ELT is available through running a SQL statement from a database stage or by calling a stored procedure. It can generate SQL statements but not SQL procedural code so it is not as flexible or advanced as PowerCenter or Sunopsis. IBM may add this functionality within the next 18 months. They already have a DB2 ELT tool in the DB2 DataWarehouse Edition in the form of a SQL language generator for populating a Data Warehouse.

The Wrap

I am not convinced advanced ELT is good an ETL tool as it increases upgraditis.

Upgraditis afflicts ETL tools when they struggle to keep up with all the upgrades in the products the integrate with. Good ETL tools have native access to most common database engines and ERP systems. They constantly need to upgrade and certify these items against new versions of these products. When Informatica added pushdown optimization they bought into a whole new area of upgraditis and will have to keep the pushdown compatible with new versions of all the databases it supports.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

猜你喜欢

转载自blueaeri.iteye.com/blog/1539867