kettle development-Day42-remote job execution

Table of contents

Foreword:

 1. Remote execution

        1. Let’s look at the definition first

        2. Preconditions

        2.1 Smooth network

        2.2 Database DB connection is consistent

   2. Practical Case-Windows     

        1. Initial configuration-remote end

        1.1 Start the carte service

         1.2cmd command starts the carte service

         2. Initialization-remote terminal

        3. Practical application

        3.1. Error cases

        3.2. Correct case

 3. Summary

Foreword:

        I first started researching remote execution of jobs so that I could process server jobs remotely on my own computer at home. After repeated research, it seemed that things were not going in the direction I had just expected. But in the end, the remote execution function was completed, so I recorded some methods for remote execution of tasks, and corrected some errors on the Internet to facilitate learning and progress together. This is the final execution result.

 1. Remote execution

        1. Let’s look at the definition first

        "Kettle remote execution" usually refers to connecting one computer to another computer through a network and executing Kettle ETL (Extract, Transform, Load) tasks or transformations on the remote computer. This usually requires the use of some remote connection tools or software, such as SSH, Telnet or VNC. This approach allows users to easily run and monitor Kettle tasks or transformations on remote computers without physically operating the remote computers.

        2. Preconditions

        2.1 Smooth network

From the above definition, we can know that the remote server/computer is required to ensure smooth network communication         between the remote server/computer .

        2.2 Database DB connection is consistent

        Literally speaking, the corresponding database can be accessed, and the corresponding data JDBC information is on both computers. And the corresponding database connection names are consistent. That is, there is the same DB connection at both ends, that is, the IP/port/resource name are consistent .

        PS: Tutorials on the Internet generally ask you to directly replace the repositories.xml file on your remote end. If your DB connections on both ends are inconsistent, this may cause problems with the DB connection on the remote end and cause abnormalities in the jobs on the remote end. For the operation of replacing the repositories.xml file, please refer to the following operations.

   2. Practical Case-Windows     

        1. Initial configuration-remote end

        1.1 Start the carte service

       Start cmd and switch the cmd directory path to the location of the remote kettle installation directory Carte.bat. For example, this time the remote Carte.bat is located under D:\kettle\pdi-ce-5.4.0.1-130\data-integration.

         1.2cmd command starts the carte service

        Starting the carte service at this time is roughly divided into the following three steps:

--win+R启动运行框,输入cmd,弹出cmd黑框
--切换至D盘
D:
--切换至Carte.bat对应目录
cd D:\kettle\pdi-ce-5.4.0.1-130\data-integration

--启动Carte服务,如 carte 10.100.21.34 8080
carte ip 端口   --如 carte 10.100.21.34 8080

        When this sentence appears, it proves that the carte service is started successfully.

         2. Initialization-remote terminal

        In addition to the prerequisites mentioned above, keep the network at both ends open + have the same DB connection. At this time, a sub-server needs to be established on the remote side.

As shown in the figure above, our main object tree on the kettle application side corresponds to the sub-server below. Right-click to create a new sub-server. The corresponding server name can be customized, and then there is the IP and port of the Crate service          started above, and the default user name . The passwords are all cluster , thus completing the configuration of the corresponding sub-services.

        3. Practical application

        3.1. Error cases

        It is also very simple in practical applications. We only need to configure the corresponding job as normal. It should be noted that we must ensure that the corresponding called conversion file or job exists on the remote end. That is, there may be no corresponding job on the remote end, but the corresponding conversion, that is, the remote .ktr file, needs to be saved in the corresponding location.

         As shown in the figure above, on the remote end, there is a file corresponding to test_zy.ktr in the corresponding folder. Therefore, when we configure the job on the remote side, we can configure the conversion of the corresponding directory in the called conversion. As shown below.

                 PS: Here, we'd better use an absolute path , because we don't have a corresponding job on the remote end, and if we use a relative path, it will prompt that the file does not exist, resulting in carte. By default, it will start from the drive letter where the carte is located. Scan the corresponding conversion, and by default it will be considered that the conversion that needs to be executed remotely is located in D:\test_zy.ktr.

        That is to say, the following error will appear. This is also the easiest problem to ignore. It is a bloody lesson.

         So use absolute paths, absolute paths, absolute paths!!! Say important things three times!

        3.2. Correct case

        The above introduces some configurations and common errors in practical applications. Next, we will introduce how to correctly execute jobs remotely.

        At this time, when we execute the job remotely, we select "Remote Execution" and select the host of the sub-server we configured corresponding to the remote host. Clicking it will jump to the control panel of the remote execution job.

         As shown in the figure below, we can view the entire process of the corresponding remote execution job in the remote execution control panel.

                 As shown in the figure above, we can see the corresponding execution process and the corresponding execution log in the control panel. The corresponding log content is similar to that of local execution, which is the entire data processing process.

        We can also see the log corresponding to the entire execution process in the remote carte service.

 3. Summary

        In fact, kettle's remote execution was originally designed to prepare for parallel development and clustering. Therefore, when multiple people are developing, we can recommend sharing the resource pool, so that the prerequisites mentioned above can be easily completed. The corresponding network must be smooth and the DB connection is consistent. Because it is a shared resource pool, we can also easily synchronize corresponding conversions or jobs to different servers or computers. In this way, we can easily execute the latest job locally or remotely to ensure that the final effect of job execution is what we want.

        Of course, if we just want to execute some jobs remotely and temporarily handle some job requirements, you can refer to my above operations. Of course, this will also help you better understand the principles behind kettle's cluster. Therefore, we can choose flexibly according to actual application needs~

Guess you like

Origin blog.csdn.net/qq_29061315/article/details/132490288