Data Mining with Kettle

 

Data Mining with Kettle

 

Part of the data of the CRM system of the customer service department needs to be synchronized with the BDP cloud. For server security, the account password will not be obtained by any third party. An intermediate server and an intermediate library are deployed, and the CRM data is extracted, and then synchronized to the bdp cloud library.

 

 

 

 

 

 

 

ETL (abbreviation for Extract-Transform-Load, that is, the process of data extraction, transformation, and loading).

As an ETL tool written in Java, Kettle supports a graphical GUI design interface, which can then be circulated in the form of workflow. It has comparisons in doing some simple or complex data extraction, quality inspection, data cleaning, data conversion, data filtering, etc. Stable performance, the most important of which is that we have reduced a lot of research and development workload and improved our work efficiency through skilled application of it.

Environment requirements: There is a java environment for local deployment (JDK1.5 or above, I will not describe how to deploy the java environment here, you can find Du Niang for details).

 

First download the kettle file, Kettle can be downloaded from http://kettle.pentaho.org/ website, unzip the kettle to D drive

Enter the d:/kettle/data-integration directory, open the spoon.bat file, and open the interface as shown in the figure:

 

The resource library connection interface is to save the created transformations and work to the resource library. It is not needed now, just close it directly.

 

1、Transaction

We first create a transition, double-click the above transition, kettle has an advantage, all operations can be like dragging a flowchart, I will not describe the specific items in it, you can refer to the kettle operation manual.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.1 Create a DB connection. The crm system of the client department is developed based on the yii framework. The development language is php and the database is mysql. Here I will not use the real server as an example. It involves the company's server information.

Create db connections local1 (source database), local2 (target database).

 

 

Here is a reminder that the extracted data has date types (date, datetime, etc.), and the DATETIME type is used to store the time in the Mysql database. When reading this field in JDBC, you should use ResultSet.getTimestamp(), which will get A data of type java.sql.Timestamp. Neither ResultSet.getDate() nor ResultSet.getTime() can be used here because the former does not include time data and the latter does not include date data.

 

However, it is not completely safe when using ResultSet.getTimestamp(). For example, when the field value of the TIMESTAMP type in the database is '0000-00-00 00:00:00', using this method to read will throw An exception occurred: Cannot convert value '0000-00-00 00:00:00' from column 1 to TIMESTAMP, this is because JDBC cannot convert '0000-00-00 00:00:00' into a java.sql .Timestamp, in Java, it is also impossible to create a java.util.Date with a value of '0000-00-00', the oldest date should be '0001-01-01 00:00:00' .

The solution is to add zeroDateTimeBehavior=convertToNull at the database connection location

 

 

 

 

1.2 Create a table input to get the data to be extracted

 

1.3 Create table data and fill the extracted data into the table of the target database

 

 

1.4 Establishing the Execution Sequence

Hold down the shift key to point from the table input to the table data

 

In this way, the data extraction is completed, save the transformation, and name it trans1. Next, we set the timing to execute the transformation.

 

2、Job

Close trans1, create job

 

 

 

 

 

 

 

 

Save as job1.kjb

 

3. Kitchen.bat executes Job

 

4. Start bat and call kitchen.bat to call job

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326391516&siteId=291194637