Download and installation of Pentaho and its simple and practical

First, go to the official website first, because it is in pure English. So it was translated.

https://community.hitachivantara.com/s/article/data-integration-kettle

Click the red framed line below to download it.

After downloading, unzip it

Kettle is an open source software for pure JAVA programming. The local environment can be run with JDK1.7 or above. After decompression, it can be used directly without installation.

Second, configure the pentaho_java_home variable in the environment variable. The value is the local jdk path

After configuration, click Spoon.bat

Wait patiently for a while after opening.

3. Create a database connection

Click Transform to switch the main object tree. The DB connection can be seen. Click on DB connection.

Select the mysql connection. Enter the relevant connection information.

Then click test, the following error occurs.

This is because there is no mysql driver package. So put the mysql driver package under pdi-ce-8.3.0.0-371\data-integration\lib. Find the driver package of the corresponding mysql version, if the downloaded version is too low, the driver package will appear. Unknown system variable 'query_cache_size' is an error, so the database cannot be connected.

I downloaded the driver package mysql-connector-java-5.1.8.jar. You can see that the test connection is successful.

Click to confirm

4. Synchronizing data

Create a new transformation, drag an input and an output from the input and output.

Select a data connection in the table input, or create a new connection

Then click to get the sql query statement

Select the table you want to enter - click OK

Once you click yes, the following error will be reported.

The guess is that the mysql database version conflicts with the mysql connection driver (mysql-connector-java) version .

The current environment is as follows:
Execute: select version();

mysql-connector-java version is : 5.1.8

Tried different versions of the connection driver:

Finally found that 5.1.47 solves the problem perfectly

Explanation:
jdbc will send the test statement SET OPTION SQL_SELECT_LIMIT=DEFAULT when connecting to the database, and mysql 5.6 and above versions no longer support this statement.

After executing sql, it will be as shown below

Insert fields from table A into table B

Table output is simply outputting data to another table.

Settings for table output:

Running result (user_copy table data): Copy the data of table A to table B

After we run it for the second time, kettle will report an error saying that the primary key already exists

This means that the table output can only be output once. If the corresponding primary key already exists in the target table, it will not be updated and an error will be reported.

If we modify the settings of the output in the table below, let's specify the following output fields:

Running result (user_copy table data):

https://blog.csdn.net/qqfo24/article/details/82190535

https://blog.csdn.net/qqfo24/article/details/82190535

You can refer to this URL pair to update or add data from a table to a new table.

The operation steps are as follows:

Click on the core object to create a new conversion

Then click on the main object tree and select DB to connect

After clicking, click the core object. Select Input. Click Table Input.

Then click insert/update

Now let's look at the data in the User table

Then take a look at the data in the test table

Then double click on insert/update

This picture is just some instructions, the picture below is my own operation picture.

Click OK. then run this transform

Click to start and save

After the operation is over, we can see the operation results below, including logs, data previews, etc. We can see how many pieces of data have been read in total, how many data have been inserted and updated, and so on.

This completes the simplest transformation, fetching data from one table, inserting and updating to another table.

Now let's look at the test table, we can see that the data with id 4 is updated from order to method

If you want to run this transformation periodically, you need to use a job.

Click on General

Drag START, TRANSITION, SUCCESS from the left to the right and connect them with lines.

Double-click START to configure the running interval of the job, which is configured to run every hour.

Double-click the transformation and select the one you created earlier

Click Run to run the job, and click Stop to stop it. In the execution result below, you can see the running log.

I add a new piece of data with an id of 1 to the user table

Now run this job

I found out that an hour was too long, so I set it to 3 minutes. operation result

Now let's see if there is the piece of data asked in the test in the database

The above screenshot shows that the timing script is inserted successfully.

If you want the scheduled task to repeat the operation, check Repeat this

You don't need to stop to run the script consistently. Click the stop button if you don't want to run the script.

Summarize

Insert update is used more because it can update data.

Table output, easy to insert duplicate data, please use with caution.

Timed operation, open can automatically update data, reduce the cost of manual operation.

{{o.name}}
{{m.name}}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=324069811&siteId=291194637