ETL tool--Kettle introduction

     The earliest contact with Kettle was when I was still in Beijing Huijin Technology Co., Ltd. (now the company was merged and reorganized by Beijing Lisichen Technology Co. , Ltd. ) in 2011, and I copied it from a colleague. At that time, it was mainly used to quickly import data between heterogeneous databases, but when the jar packages were incompatible, Chinese garbled characters were prone to appear.

 

Typical representatives of ETL tools are: Informatica, Datastage, OWB, Microsoft DTS, Beeload, Kettle
open source tools have etl plug-ins for eclipse: cloveretl
data integration: rapid implementation of ETL
ETL quality issues are specifically manifested in correctness, integrity, and consistency , completeness, validity, timeliness, and accessibility. There are many reasons that affect the quality problem. The reasons caused by system integration and historical data mainly include: inconsistent data models between systems in different periods of the business system; changes in business processes in different periods of the business system; old system modules in operation, personnel, finance Inconsistencies in related information such as , office systems, etc.; inconsistencies caused by incomplete data integration between legacy systems and new business and management systems.
To realize ETL, we must first realize the process of ETL conversion. It is reflected in the following aspects:
1. Null value processing: It can capture the null value of the field, load it or replace it with other meaning data, and realize the load to different target libraries according to the null value of the field.
2. Normalized data format: Field format constraint definition can be realized. For data such as time, value, and character in the data source, the loading format can be customized.
3. Split data: fields can be decomposed according to business requirements. For example, the calling number is 861082585313-8148, the area code and phone number can be decomposed.
4. Verify the correctness of the data: You can use the Lookup and split functions for data verification. For example, the calling number is 861082585313-8148. After the area code and the phone number are decomposed, Lookup can be used to return the calling area recorded by the calling gateway or switch for data verification.
5. Data replacement: For business factors, invalid data and missing data can be replaced.
6. Lookup: Find missing data Lookup implements sub-query, and returns missing fields obtained by other means to ensure field integrity.
7. Establish the primary and foreign key constraints of the ETL process: illegal data without dependencies can be replaced or exported to the wrong data file to ensure the loading of the unique record of the primary key.

 



 

ETL (abbreviation for Extract-Transform-Load, that is, the process of data extraction, transformation, and loading), for enterprise or industry applications, we often encounter various data processing, transformation, and migration, so understand and master one The use of etl tools is essential. Here is an introduction to the ETL tool Kettle, which supports a graphical GUI design interface, and can then be circulated in the form of workflow, doing some simple or complex data extraction, quality inspection, data cleaning, and data conversion. , data filtering and other aspects have relatively stable performance, among which the most proficient application of it reduces a lot of research and development workload and improves work efficiency, but this tool is written in Java.

 

1. Kettle concept

 

Kettle is a foreign open source ETL tool, written in pure java, can run on Window, Linux, Unix, green without installation, efficient and stable data extraction.

 

Kettle's Chinese name is Kettle. MATT, the main programmer of the project, hopes to put various data into a kettle, and then flow out in a specified format.

 

Kettle is a set of ETL tools that allow you to manage data from different databases by providing a graphical user environment that describes what you want to do, not how you want to do it.

 

There are two script files in Kettle, transformation and job. Transformation completes the basic transformation of data, and job completes the control of the entire workflow.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326607086&siteId=291194637
ETL