[ETL practice guidelines] Cloud Based on the data of MaxCompute plug Kettle

Used in this article

Ali cloud number plus - big data computing services MaxCompute Products Address: https://www.aliyun.com/product/odps


Brief introduction

Kettle is an open source ETL tool, pure java implementation, runs on Windows, Unix, Linux runs, provides a graphical user interface, you can drag and drop controls, convenient to define the topology of the data transmission. Kettle supports a rich source of data input and output, database support for Oracle, MySql, DB2, etc., and also supports a variety of industry open-source big data systems, such as HDFS, HBase, Cassandra, MongoDB and so on. This article describes how to use MaxCompute plug seamless Ali cloud computing platform for big data --MaxCompute.

Environmental requirements

  • JDK (1.6 or higher, 1.7 is recommended)
  • Kettle (recommended after the 5.4.0 version)
  • Apache Maven 3.x

Plug-in deployment

Download MaxCompute of Kettle plug-in package

$ wget http://odps-repo.oss-cn-hangzhou.aliyuncs.com/data-collectors%2Faliyun-kettle-odps-plugin-2.0.2.tar.gz

Kettle MaxCompute deploy plug-ins

The kettle-odps-plugin Kettle plugins installed to the directory

$ cp aliyun-kettle-odps-plugin-1.0.0.tar.gz {YOUR_KETTLE_DIRECTORY}/plugins
$ cd {YOUR_KETTLE_DIRECTORY}/plugins
$ tar zxvf aliyun-kettle-odps-plugin-1.0.0.tar.gz && rm aliyun-kettle-odps-plugin-1.0.0.tar.gz

After installation is complete, restart the Kettle, creating a new transformation (Transformation) after the restart, then find Aliyun MaxCompute Input / Output In this Big Data categories, as shown in FIG.

Screenshot 5.36.29.png 2016-11-30 PM

scenes to be used

Here are some simple example demonstrates how to use MaxCompute plug Kettle will import or export data.

MySQL data import MaxCompute

Installation of MySQL JDBC Connector

kettle default is not the MySQL JDBC Connector, you need to download the MySQL JDBC Connector , the Connector package put the jar to the lib directory under the kettle, you can restart the kettle.

Configuration Steps

  • The new mysql DB connection

mysql connect.png

  • Add a table of mysql input

Table Input configuration requires a sql, words tables in the present embodiment is introduced MaxCompute Mysql database, as shown in the following figure run "select * from words;" to such a SQL wherein Schema words table is "(id int, line varchar (1000)) ".
mysql input.png

  • Add Aliyun MaxCompute Output

Mysql first need to create a data source and a data table corresponding MaxCompute, the present embodiment in the construction table statement as follows:

create table testoyz (a bigint, b string);

Then, necessary parameters shown below, configured endpoint, accessId, accessKey, projectName, tableName like.

Paste Picture 0.png

  • Run the conversion, view the results

Run the established conversion Steps, after a successful run, you can query the data transmitted to the corresponding data in MaxCompute table.

odps@ xxx_project_name>read testoyz;
+------------+------------+
| a          | b          |
+------------+------------+
| 1          | hello world |
| 2          | hello maxcompute |
| 3          | test test test |
+------------+------------+

Export MaxCompute table to an Excel file

Aliyun MaxCompute Input assembly may be used to download the data table MaxCompute to Excel file, the configuration of FIG example in the previous table exported to Excel file.

Paste Picture 1.png

Once configured to run the conversion, data in the table can be downloaded in excel files saved.

Guess you like

Origin yq.aliyun.com/articles/68911