[Ali] Big Data Cloud Data Integration Platform Tutorial

Data Integration (Data Integration) is Ali Group to provide both data across heterogeneous storage systems, reliable, secure, low-cost, elastically scalable data synchronization platform that provides off-line under different network environment for 20+ kinds of data sources ( full amount / increment) data out of the channel, it is Ali Group to provide both stable and efficient, resilient scalable data synchronization platform. We are committed to providing complex network environment, a wealth of data between heterogeneous data sources and stable high-speed data movement and synchronization capabilities.

About Ali cloud data integration platform more: Ali cloud data integration platform Tutorial

Offline (batch) data synchronization Introduction

Off-line (batch) of the main channel data by defining the data source and destination of the data sources and data sets, provide a abstracted data extraction plug (called Reader), plug-in data is written (referred Writer), and based on this simplified version of the intermediate frame design a data transmission format, so as to achieve the purpose of data transfer between any of the structured, semi-structured data source.

13308139-0dd10b542deede04.png

Support data source types

Integration provides a wealth of data supporting a data source, as follows:

Text storage (FTP / SFTP / OSS / multimedia files, etc.).

Database (RDS / DRDS / MySQL / PostgreSQL, etc.).

NoSQL(Memcache / Redis / MongoDB / HBase 等)。

Large data (MaxCompute / AnalyticDB / HDFS, etc.).

MPP database (HybridDB for MySQL, etc.).

For more details, please refer to support data source types .

note:

Since the configuration information of each data source is large gap requires detailed information based on the query parameters usage. Therefore, data source configuration, the configuration page provides a detailed job description, please use according to their query.

Synchronous development notes

Provide simultaneous development of two development modes: Wizard mode and script mode.

Wizard mode: Provides wizard-guided development through visualization and fill in the next boot, work quickly to help configure a data synchronization task. Low-cost learning guide mode, but you can not enjoy some of the advanced features.

Script mode: You can do this by writing data in JSON data synchronization script to synchronize directly developed for advanced users, the higher the cost of learning. Script mode can provide more rich and flexible capabilities, configuration management do fine.

note:

Wizard mode generated code can be converted to script mode, this is converted to one-way operation, can not be restored to the wizard mode after the conversion is complete. Because the script mode capability is a superset of the guide mode.

The need to create a profile and target table of the data source before writing code.

Network Type Description

Network types are divided into: Classic network, a proprietary network (VPC), IDC local network (planning).

Classic Network: unified network of public infrastructure deployed within the cloud Ali, planning and management of the network responsible for Ali cloud, more suitable for relatively high ease of use requirements of customers to the network.

Private Network: Based Ali cloud build an isolated network environment. You can completely control their own virtual network, including the selection of its own IP address range, divided segments, as well as configure the routing table and gateways.

IDC local network: build your own network environment room, and Ali cloud network is isolated unavailable.

Classic proprietary network and network-related issue, please see the classic network and VPC FAQ the FAQ  .

Additional information:

Network connection can support public network connections, network type can select the classic network. Note that the speed public network bandwidth consumption and the associated network costs. No special circumstances is not recommended.

Planned network connection, data synchronization, you can use local resources to run the new script + mode of transmission solutions for data synchronization. Or use the SHELL + DataX scheme that refer to perform tasks using datax shell .

VPC is to build a proprietary network isolated network environment, you can customize the range of IP addresses, subnet, gateway and other network security with a proprietary improve, increasingly wide use of proprietary networks, so data integration provides RDS-MySQL , RDS-SQL Server, RDS-PostgreSQL, does not require the purchase of a reverse proxy will automatically detect the network in order to be able to communicate with the VPC same network ECS, under the system through a proprietary network. For Ali cloud other databases PPAS, OceanBase, Redis, MongoDB, Memcache, TableStore, HBase, etc., will also provide follow-up support. Therefore, non-RDS source data synchronization task configuration data integration in the private network to purchase ECS same network, so that the network can communicate by ECS.

Restrictions and Limitations

Support and only supports structured (e.g. RDS, DRDS etc.), semi-structured, unstructured (OSS, TXT, require particular abstract data must synchronize structured data) data synchronization. In other words, Data Integration supports synchronous data transmission can abstract the logical two-dimensional table, the other completely unstructured data, such as OSS stored in some MP3, Data Integration yet to synchronize their support to MaxCompute, this feature will be implemented in late.

Support across the region and part of a single geographical area data storage synchronized with each other, exchange data synchronization needs.

In some areas by the classic transmission network is not guaranteed. If you must use the classic test, and the network is, consider using the public network connected.

Only complete data synchronization (transmission), do not themselves provide consumption data stream.

More quality technical courses:

Ali cloud university's official website ( Ali Cloud University - Official website, creative talents under a cloud ecology workshop )

Reproduced in: https: //www.jianshu.com/p/05016a7dbf6a

Guess you like

Origin blog.csdn.net/weixin_33873846/article/details/91275166