Detailed explanation of kettle tool download, installation, data migration, and scheduled tasks

篇幅有点长，将这五个内容放在了一篇文章里，其中最主要的是数据迁移和定时任务

Table of contents
1. Brief introduction
2. Download
3. Installation
4. Data migration: including overall data migration of a single table, data migration of different fields in a single table, and simple batch data migration
5. Scheduled tasks

1. Brief introduction

kettle is an ETL tool, ETL (Extract-Transform-Load, data extraction, transformation, loading);
Kettle has two script files, transformation and job; transformation is transformation, and job is the abstraction and control of the entire workflow built by multiple transformations;
Data extraction is efficient and stable;

2. Download (relatively large, please wait)

Chinese website
download address

3. Installation

1. Unzip the file and open the "unzipped" file below. I put mine on the D drive (you decide this)
Insert image description here

2. Configure kettle’s environment variables

1》.Open system properties (directly search for "environment variables") and click on environment variables
Insert image description here

2》. Click "New" in the system variables, "Edit System Variables" will pop up, enter the variable name and variable value (this is my own address, for reference), and click OK

Insert image description here

3》Click Path, "New", map the variable name just now, click OK, and the configuration is completed

Insert image description here

3. Enter the "data-integration" folder, find "Spoon.bat", double-click to run (kettle does not require special installation), it may be a bit slow, don't wait impatiently

Insert image description here

4.Already started

Insert image description here

4. Data migration (single table and batch data migration)

Now that we have entered the kettle tool, how to use it? The following is the use of source database table data to import data into the target database, which is divided into single table import and batch import .

This is the interface that opens

Insert image description here

How to transfer table data from one database to another database. There are three migration methods below;

1. Overall data migration

注意：这里演示的是将一个库A的单个表整体转入到另一个库B中，B库没有要迁移的这张表；还会演示将一个库A的表数据转移到目标库B已有的表中，而这个目标库B的表已经存在，而且有的字段还不一定对应

1. Click on the file to create a new conversion; or double-click the conversion as shown in the figure

Insert image description here

2. Double-click the DB connection below, the database connection will pop up, select your corresponding library, fill in the corresponding content, you can test whether the connection is connected, and click OK
(my connection below is postgres)

注：一个表的话连接一个，两个的话相同操作，名字别搞混，以便后面好操作

Attached is a mysql connection diagram

3. Click to select the core object, search table input, table output; double-click to display it in the workspace, click on a certain step and press delete or right-click to delete the step; the arrow in the middle will be connected directly,
if it is not displayed To connect, you can use shift+left mouse button to connect; you can also click on the picture below to connect; in this way, a connection is established between them.

4. After connecting, start exporting the table. Double-click the table input; select the database connection you want to export the table to, obtain the SQL query statement and fields, and preview the table data; then click OK.

Insert image description here

5. Double-click the table output, select the database connection of the export table, select the mode, you can select the target table or enter the name of the table you want to export. Click sql to directly form the sql statement you want, and then execute the following (if selected The target table does not need to be executed), the table will be automatically created after execution; click OK.

Insert image description here

6. Click the "small triangle", run, click Start
Insert image description here
7. Save the converted file, the conversion is successful

注：现在已经将源表导入到目标表，如果报错没有迁移成功，看是否点击执行了sql语句或者保存了转换的文件，保存的文件后缀是.ktr文件

Two: Migrate data to existing tables

这里演示的将一个库A的表数据转移到目标库B已有的表中，而这个目标库B的表已经存在，而且有的字段还不一定对应

1. The tools currently used are as follows. You can search directly, drag them to the panel, or click directly

Insert image description here
2. Then start from the first one, first click the first module table input, as shown in the picture:

select the fourth icon with an arrow or shift+left mouse button, and then you will find a gray arrow moving with your mouse , then move the mouse to the second module value mapping, and you will find that the arrow turns blue. Then click the second module value mapping, and you will find that there is an extra blue arrow between the table input and the value mapping. Sometimes you need to select Main output step. All modules are connected in series in this way, as shown below:

Insert image description here

3. The above picture represents the data exchange of a complete table. At this time, we need to click on the main object tree in the left column, then right-click on the DB connection, select New, then fill in your database connection information and test. After successful connection, click Confirm to create a new connection, as shown in the figure:

Insert image description here

4. The first operation is the table input. As the name suggests, the table input represents the source of the data. Double-click the table input, select the database connection, then write the SQL you want to retrieve the data, and then click Preview to view the data you found. At this time, you must check the data format, because some fields, such as disabled, are stored in the database as 0 or 1, but in the data found through Kettle, disabled is Y or N

Insert image description here

5. If the data is previewed, it means that the SQL statement is correct. Click on value mapping. Value mapping is to change certain values to another value, such as mapping Y and N to 1 and 0. Double-click the value mapping, first select the field name to be used, then fill in the original value and target value, and click OK. As shown in the picture:

Insert image description here

6. Click Field Selection. Since many fields must be different during data migration, how to match the fields in the two tables one-to-one? This way, we can change the field names to what we want through field selection. First, double-click the field selection and switch to the third tab: Metadata. Then click on the right to get the changed fields, and then Kettle will list the fields queried in your SQL one by one. Fill in the fields in your current table in the column to be renamed, and fill in the corresponding type for the type. It should be noted that Integer represents Long, and Number represents Double. At the same time, the type can also be left blank, but it is not recommended. After filling in, select OK. As shown in the picture:

Insert image description here

7. At this time, you will find that there may be many fields in your current table that are not in the original table, but these fields are indispensable and cannot be empty, so you need to use the module of adding constants. You can Fill in the name, type, length, value (default value) and click OK. As shown in the picture:

Insert image description here

8. After completing the previous part of the operation, the basic fields can be aligned. However, some fields require some logical processing, and the retrieved values cannot be directly stored in the current table. For example, some projects have been migrated. When data already exists in the current table, the IDs are duplicated. Change the migrated data ID directly to a negative value, and the associated ID should also be changed to a negative value. This can solve the problem of duplicate IDs and inability to migrate data. , and the relationship will not be lost.
· At this time, you need to use the java code module. There is a problem here. The java code module that comes with Kettle will not automatically import the jar package, so it is recommended to write the code in the idea first, and then copy it over. At the same time, you need to import Copy the jar package.
· Of course, the imported jar package must exist in the lib folder under the Kettle folder. When writing code, you can double-click getValue in the input fields in the left column, which is the value; double-click setValue in the output fields in the left column, which is the assignment. When writing code, you can click on the test class in the lower right to test the java code. Therefore, the java code module can be used for logical processing.
If it is not needed, delete this step
as shown in the figure:

Insert image description here

9. Double-click the table output, select the database connection of the export table, select the mode, you can select the target table or enter the name of the table you want to export. Click sql to directly form the sql statement you want, and then execute the following (if selected The target table does not need to be executed), the table will be automatically created after execution; click OK.

Insert image description here
10. Click Run and save the converted file.

3. Direct import individually or in batches

注：这里postgres中只可以迁移源库的public中的数据，其他模式下的会提示找不到

1. Click Tools - Wizards - Copy Single Table Wizard or Copy Multiple Table Wizard (multiple tables are selected here), click on the source database and target database, and click Finish 2. Find the table you want to use in the
Insert image description here
source database, and click the middle button It is migration. The table you selected will be displayed on the right. Click next.

3. Fill in the file name, select the folder to save, and click finish.

4. Click Execute and the migration is successful.

5. Scheduled tasks

1. Click File - New - Job, as shown below:
Insert image description here

2. Search for Start and Convert functions and drag them to the work desktop, or double-click to establish a connection between the two (hold down Shift + left mouse button to connect, or double-click to connect directly)

Insert image description here

3. Double-click Start, a pop-up window will appear, select the type, fill in the required time interval, the data will update the database data within the set time, click OK (OK cannot be found, zoom in); if you select "Repeat", Each update will superimpose data on the original basis.

Insert image description here

4. Double-click "Convert", fill in the converted file name, click Browse, select the .ktr file to be used for scheduled tasks (you cannot browse, just paste the file path over), and click OK (the .ktr file here is the data you just created files saved after migration)

Insert image description here

5. Click Execute, save the conversion file, and the execution is successful; the scheduled task is now completed; the table data in the database will be refreshed according to the time interval you set
(here is just a scheduled task for one table!)
*注：如果报错，要看自己选择的.ktr文件是否正确，并且是否保存的要转换的文件*

Insert image description here
6. After completing the scheduled task, this software cannot be closed. After closing, the scheduled task will no longer be executed. Remember!

Reference: Data migration

Hope it helps you

`~感谢您的光临~`

Insert image description here