Data migration tool Sqoop and DataX function comparison

Recently, due to the needs of the project, I have conducted research on the Apache Sqoop and Taobao DataX tools. Here is a preliminary review of the functions of the two . It will not involve technical details and usage methods, and will be reserved for future selection reference.

Sqoop is a top-level project under Apache, which is used to transfer data between Hadoop and relational databases. It can import data from a relational database (eg MySQL, Oracle, PostgreSQL, etc.) into Hadoop's HDFS, or Import HDFS data into a relational database. At present, it is widely used in various companies, and the development prospect is relatively optimistic. Its characteristics are:

1) It is specially designed for Hadoop, and has a good level of support with Hadoop version updates. It was originally an open source project incubated from the CDH version, and it should be no problem to support CDH4.

2) Supports parallel import, claiming to be very fast (due to tight time, it will be time to test in the real environment in the future), you can specify to split and parallelize the import process by a certain field.

3) Support import and export by field.

4) The built-in auxiliary tools are rich, such as sqoop-import, sqoop-list-databases, sqoop-list-tables, etc.

DataX is Taobao's open source data import and export tool, which supports data exchange between HDFS clusters and various relational databases. Its characteristics are:

1) The Hadoop version supported by the official version is lower (0.19), and higher versions (such as CDH4) are not supported for the time being.

2) Support data import and export from one HDFS cluster to another HDFS cluster.

3) Support parallel import and export without data landing.

Note: The above is not a very comprehensive comparative analysis of these two tools, it is for reference only, welcome to make bricks.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326260467&siteId=291194637