Kettle tutorial: Spoon kettle data conversion example

Kettle: Four major families (core components)

Chef (Chinese: chef), Kitchen (Chinese: kitchen), Spoon (Chinese: spoon), Pan (Chinese: pan)

  • Chef—job design tool (GUI mode).

  • Kitchen—job executor (command line).

  • Spoon—transform design tool (GUI mode).

  • pan—transform executor (command line mode).

The difference between Job and Transformation: Transformation focuses on ETL of data, while Job has a wider range, which can be Transformation, Mail, SQL, Shell, FTP, etc., or even another Job.

spoon introduction

Spoon is Pentaho Data Integration's (PDI) graphical user interface (GUI) tool for designing, developing, and managing ETL (extract, transform, and load) processes. It is part of the Pentaho suite designed to help users easily create and manage data integration jobs.

Here are some key features and functions of Spoon:

  1. Graphical interface: Spoon provides an intuitive graphical user interface that enables users to design ETL processes by dragging and dropping and connecting various components. This visualization simplifies the job design process and makes it easier for users to understand and manage data flow.

  2. Component library: Spoon has a rich built-in component library, including input components (such as database query, file reading), conversion components (such as data cleaning, conversion, filtering) and output components (such as database writing, file writing). These components can be flexibly combined according to requirements to build complex data integration processes.

  3. Data transformation and processing: Spoon provides various transformation and processing functions, such as field mapping, data filtering, row merging, sorting and aggregation, etc. These features enable users to perform various operations and processing on the data to meet the needs of the ETL process.

  4. Scheduling and monitoring: Spoon allows users to set job scheduling plans so that jobs are automatically executed at specific points in time. In addition, it provides monitoring and logging capabilities to track job execution status and output results.

  5. Plug-in extension: Spoon supports plug-in extension, users can add custom components or functions as needed. This allows Spoon to be integrated with other systems or tools for specific data integration needs.

All in all, Spoon is a powerful and easy-to-use ETL tool that provides a graphical interface and a rich component library, enabling users to easily design, develop and manage complex data integration jobs. Whether a novice or an experienced developer, Spoon enables efficient data integration and transformation.

Spoon kettle conversion example

1. Open the spoon.bat script under the kettle folder (spoon.sh is under the Linux system). When starting the kettle tool, select on the toolbar: File –> New –> Convert, a conversion will be created, and the default is conversion 1. You can modify the conversion name. In the toolbar: File -> Keep, you can save and rename the file. Of course, you can also right-click the transformation in the column of the number of main objects, and click Settings to change the transformation name. As shown in the picture:
insert image description here
After changing the name, click OK and it will be OK.

2. Create table input and table output spaces, select both of them at the same time, right-click and click "New Node Connection" in the shortcut menu, and click OK after selecting the initial step and target step. The result is as shown below
insert image description here

3. Use kettle to link database

First, you need to create two data tables in the database for data preparation. That is, data table personal_a and data table personal_b, as shown in the figure below:
insert image description here
Then use kettle to create a conversion, add "table input" control, "javaScript code" control, "insert/update" control and jumper connection line, as shown in the figure below:
insert image description here
Click the table input control to configure it. Click the "New" button to configure the database connection. After the configuration is complete, click the "Confirm" button to complete the configuration of the mysql database connection; then click the "Get SQL Query Statement" button to pop up the "Database Browser" window, expand field_stitching, And select the data table personal_a under the "Table" menu; finally click the "OK" button, the "Question?" window pops up, click the "Yes" button, the final effect of the configuration of the "Table Input" interface is shown in the figure below: and then
insert image description here
proceed JavaScript control configuration, click the "Get Variable" button, add a new field name username in the "Rename" field of the field window, as shown in the figure below: Configure the Insert
insert image description here
/Update control, double-click the "Insert/Update" control to enter, click Click the "New" button to configure the database connection. After the configuration is complete, click the "Confirm" button, as shown in the figure below:
insert image description here
Database name Create the database name for yourself, as well as the super user root and password.

Click the Browse button, select the data table personal_b, and click the "OK" button to complete the selection of the target table.
insert image description here
Click the "Get Field" button to specify the keywords needed to query data. Here, the id field in the data table personal_b and the id field in the input stream are selected.
insert image description here
Click the "Edit Mapping" button, and the "Mapping Matching" window will pop up, select the fields in the "Source Field" option box and the fields in the "Target Field" option box, and then click the Add button to map one pair to the other in turn. The field is added to the "Mapping" option box. If the field in the "Source Field" option box is the same as the field in the "Target Field" option box, you can click the "Guess" button to let kettle automatically implement the mapping. Finally click OK to complete the control configuration.

Then, click the Run button at the top of the conversion workspace to run the created conversion field_stitching to splice the surname field and the name field in the data table personal_a, and insert the resulting data into the data table personal_b, as shown in the following figure:

insert image description here
Check whether 7 pieces of data have been successfully inserted into the data table personal_b, as shown in the following figure:

insert image description here
It can be seen from the table that data has been inserted into the data table personal_b, and the username field shows the result data after splicing the surname field and the name field in the data table personal_a, so it can be explained that the data table personal_a has been successfully implemented. The surname field and the name field are spliced, and the resulting data is inserted into the data table personal_b.


links:

Getting Started with Kettle Data


Conversion Example

Guess you like

Origin blog.csdn.net/a772304419/article/details/132646742