Data docking scheme

Table of contents

Basic Information

Network topology

Switches and firewalls are needed to ensure network security;
in principle, a front-end processor needs to be placed at the network boundary (the private network may not require a front-end processor), which can be used for buffering large amounts of data, and can also be used for network security isolation

Introduction to the two modes

1 Extraction mode (pull from business side):

  • Advantages: simple technical implementation/low cost, no additional costs;
  • Disadvantages: It is easy to cause performance problems to the source system / The data structure of the source system is changed, which is prone to errors and synchronization failures / The source system is not responsible for data quality

2 Supply mode (proposed by the business side):

  • Advantages: no intrusion into the source system / data quality can be pushed to the business side;
  • Disadvantages: Additional third-party interface fees are required / the implementation of docking technology is relatively complicated

1 Three specific schemes of extraction mode

1.1 Extraction mode - WebService interface

Business system - interface <- access node - big data platform

  • Advantages: data access time point and speed are controllable/ business end can control data range and data encryption;
  • Disadvantages: Batch access to interfaces may lead to instability of the business system;
  • Applicable scenarios: small batches of structured data;
  • Unsuitable scenarios: large batches of unstructured data/real-time data synchronization

1.2 Extraction mode - the way of directly connecting to the database backup

Business system - database backup <- access node - big data platform

  • Advantages: The business system provides a backup library, which has no impact on the business;
  • Disadvantages: In some scenarios, the business does not have a backup database/data encryption is guaranteed by the platform side;
  • Applicable scenarios: small batches of structured data;
  • Unsuitable scenarios: large batches of unstructured data/real-time data synchronization

1.3 Extraction Mode - File Synchronization

Business system - file address <- access node - big data platform

  • Advantages: The business system has no perception;
  • Disadvantage: Batch pull has an impact on network fluctuations;
  • Applicable scenarios: unstructured data

2. Four specific schemes of the number supply mode

2.1 Data supply mode - API interface

Business system - interface -> access node - big data platform

  • Advantages: The business system has no perception and less risk;
  • Disadvantages: higher performance requirements for platform-side interfaces/need to pay for interface development;
  • Applicable scenarios: small batch structured data/real-time data synchronization;
  • Unsuitable scenarios: large batches of unstructured data

2.2 Data supply mode - database synchronization

Business system - database main library -> database backup library - access node - big data platform

  • Advantages: The business system has no perception;
  • Disadvantages: Additional interface fees are required;
  • Applicable scenarios: small batch structured data/real-time data synchronization;
  • Unsuitable scenarios: large batches of unstructured data

2.3 Data supply mode - (compression) file synchronization

Business system -> FTP server - access node - big data platform

  • Advantages: The business system has no perception;
  • Disadvantages: Additional interface fees are required;
  • Applicable scenarios: non-real-time data synchronization;
  • Unsuitable scenario: real-time data synchronization

2.4 Data supply mode - real-time synchronization

Business system -> message queue (kafka) - access node - big data platform

  • Advantages: real-time data processing;
  • Disadvantages: Additional interface fees are required;
  • Applicable scenarios: real-time data synchronization;
  • Unsuitable scenarios: unstructured data

Guess you like

Origin blog.csdn.net/zuoan1993/article/details/122564397