Database and MPP data warehouse (6): the use of open source ETL tool kettle

Kettle is a foreign open source ETL tool, written in pure java, can run on Window, Linux, Unix, data extraction is efficient and stable.

The Chinese name of Kettle is kettle. The main programmer of the project MATT hopes to put various data in a kettle and then stream it out in a specified format. Kettle is an ETL tool set, which allows you to manage data from different databases, by providing a graphical user environment to describe what you want to do, not how you want to do it. There are two script files in Kettle , transformation and job . Transformation completes the basic transformation of data, and job completes the control of the entire workflow.

 Use of Kettle

 First unzip the downloaded compressed package, and then open spoon.bat, as shown in the figure:

 

Go to File -> New Conversion

The process of establishing a database connection is similar to that of other database management software. Note: During the process of database connection, an exception that a certain database connection cannot be found may be reported. That's because you don't have the corresponding database link driver, please download the corresponding driver and put it into the lib folder of kettle.

Simple data table insert/update

(1) New table insertion: select "core object" in the left panel, select "input -> table input" in the core object, and drag the mouse to the right panel. as the picture shows:

 

Double-click the dragged table to edit the table input. Select the database connection and edit the sql statement. At this step, you can click Preview to see if you are connected correctly.

(2) Output to table through table output: select the core object in the left panel, select "output -> table output" as shown in the figure:

Edit table output: First: table input is connected to table output, select the table input, hold down the shift key, and drag to the table output. Then: Double-click the output of the table, edit it, and click Run to check the effect, to see if there is an error, this must be saved before it can run, and it can be saved to any place at will.

Use job control to switch execution above

Use jobs to perform conversion regularly or periodically, and create a new job. And drag in start and conversion from the left panel.

 Click to open the conversion, you can set the conversion task that needs to be performed, for example, you can perform the conversion we did above, XXX.ktr

Excel input -> table output

Select the Excel file to be processed, the table type is 2007, select the name of the corresponding sheet, select the header field, and the field in the target output table

send email

1: Email authorization, log in the mailbox to select POP3/SMTP/IMAP under the settings, and the mailing address can use QQ, 126, 163 and other mailboxes.

2: Set the authorization password. The authorization password cannot be the same as the login password. This requires SMS verification.

3: Server configuration, fill in smtp.qq.com or smtp.126.com for smtp server, etc.

Select resource library

The kettle resource library is used to save conversion tasks, and the conversion tasks created by the user through the graphical interface can be saved in the resource library. The resource library allows multiple users to share conversion tasks. The conversion tasks are grouped and managed in the form of folders in the resource library, and users can customize the folder name.

There are two forms of resource library:

1. Kettle database repository, which is stored in various common database resource repository types. Users access resources in the resource repository through user name/password. The default user name/password is admin/admin and guest/guest.

2. Kettle file repository, the type of resource library stored in the server hard disk folder. This type of resource library does not require a user to log in and can operate directly.

Guess you like

Origin blog.csdn.net/yezonggang/article/details/109470183