Kettle ETL tool use and integration with Java for data cleansing

 

 

 

This paper describes the kettle use and integration with Java, specifically download and install your own Baidu!

There are two ways kettle script: convert and work, work can be added to convert the following conversion example.

1. Create a conversion,

2. In operation often used is the table of input and output tables (data extracted from a database, the database is inserted into another save)

Click core object -> Input -> Input table

This node is used to query data from the database

Click on "Table Input" -> New to create a database connection first, let the kettle to know where to query data

kettle supports multiple database connections, because the work is used in mysql, so the rest can understand their own Baidu

 

Because during use, the database may be inconsistent encoding, so the need for coding, when creating data connections, click the Advanced tab, as shown in (set names utf8;)

 

 Click the Options tab, shown in Figure: characterEncoding: utf8

 

 

 Once created, click Save, the following is to start writing sql, kettle support sql oil form of custom parameters, parameters to use $ {} to create. Note that we have to check the following, replace sql statement in a variable.

 

 Click OK after you've created.

 

As in creating a node for receiving the data, click in the output table, create a database connection, and the above step of creating

 

 Click input field mapping, the corresponding field can be generated automatically.

3. Once created, using the shift and the left mouse button to connect two nodes

 

4. Start:

  If the custom query parameters, you need a parameter assignment at startup:

  

 

5. Also note that if there is no repository is created, then the conversion / job creation is to be saved as a script to a local store, but if you have created a repository, then the script error into the database, not the files !

6. Create a repository

  

 

First you need to create the repository, if you've created, select the corresponding resource library and then enter the account password (default account and password admin)

When you create a repository, click the + sign, then the figure will be style, we chose the second, generate data will be saved to the database.

We need to put the specified resource inventory database connection is which, and the way to create the same above we create a database connection node to create finished, select the database connection, you can name your own name and description (try not to repeat, need time to integrate with Java these two parameters)

Once created, we enter the account password to log on to the repository.

 

Guess you like

Origin www.cnblogs.com/guanjunhui/p/10860404.html