Table of contents
- Basic Information
- 1 Three specific schemes of extraction mode
- 2. Four specific schemes of the number supply mode
Basic Information
Network topology
Switches and firewalls are needed to ensure network security;
in principle, a front-end processor needs to be placed at the network boundary (the private network may not require a front-end processor), which can be used for buffering large amounts of data, and can also be used for network security isolation
Introduction to the two modes
1 Extraction mode (pull from business side):
- Advantages: simple technical implementation/low cost, no additional costs;
- Disadvantages: It is easy to cause performance problems to the source system / The data structure of the source system is changed, which is prone to errors and synchronization failures / The source system is not responsible for data quality
2 Supply mode (proposed by the business side):
- Advantages: no intrusion into the source system / data quality can be pushed to the business side;
- Disadvantages: Additional third-party interface fees are required / the implementation of docking technology is relatively complicated
1 Three specific schemes of extraction mode
1.1 Extraction mode - WebService interface
Business system - interface <- access node - big data platform
- Advantages: data access time point and speed are controllable/ business end can control data range and data encryption;
- Disadvantages: Batch access to interfaces may lead to instability of the business system;
- Applicable scenarios: small batches of structured data;
- Unsuitable scenarios: large batches of unstructured data/real-time data synchronization
1.2 Extraction mode - the way of directly connecting to the database backup
Business system - database backup <- access node - big data platform
- Advantages: The business system provides a backup library, which has no impact on the business;
- Disadvantages: In some scenarios, the business does not have a backup database/data encryption is guaranteed by the platform side;
- Applicable scenarios: small batches of structured data;
- Unsuitable scenarios: large batches of unstructured data/real-time data synchronization
1.3 Extraction Mode - File Synchronization
Business system - file address <- access node - big data platform
- Advantages: The business system has no perception;
- Disadvantage: Batch pull has an impact on network fluctuations;
- Applicable scenarios: unstructured data
2. Four specific schemes of the number supply mode
2.1 Data supply mode - API interface
Business system - interface -> access node - big data platform
- Advantages: The business system has no perception and less risk;
- Disadvantages: higher performance requirements for platform-side interfaces/need to pay for interface development;
- Applicable scenarios: small batch structured data/real-time data synchronization;
- Unsuitable scenarios: large batches of unstructured data
2.2 Data supply mode - database synchronization
Business system - database main library -> database backup library - access node - big data platform
- Advantages: The business system has no perception;
- Disadvantages: Additional interface fees are required;
- Applicable scenarios: small batch structured data/real-time data synchronization;
- Unsuitable scenarios: large batches of unstructured data
2.3 Data supply mode - (compression) file synchronization
Business system -> FTP server - access node - big data platform
- Advantages: The business system has no perception;
- Disadvantages: Additional interface fees are required;
- Applicable scenarios: non-real-time data synchronization;
- Unsuitable scenario: real-time data synchronization
2.4 Data supply mode - real-time synchronization
Business system -> message queue (kafka) - access node - big data platform
- Advantages: real-time data processing;
- Disadvantages: Additional interface fees are required;
- Applicable scenarios: real-time data synchronization;
- Unsuitable scenarios: unstructured data