19 | Summary: Key Design of Distributed Architecture

In the previous articles, we focused on domain modeling, microservice design and front-end design methods, which can be combined to form an overall solution for middle-end construction. Most of the middle and Taiwan platforms are based on a distributed micro-service architecture. There are many aspects of this enterprise-level digital transformation that deserve our attention and consideration.

We should not only pay attention to the enterprise business model, business boundaries, and the integration of the front and middle offices, but also pay attention to the design and collaboration of data technology systems, microservice design, multi-active and other fields. Combining the implementation experience and thinking, today we will talk about several key issues under the distributed architecture.

1. What kind of distributed database to choose?

The data application scenarios under the distributed architecture are far more complex than the centralized architecture, and many data-related problems will arise. When it comes to data, the first thing is to choose the right distributed database.

Most distributed databases use multiple copies of data to achieve high performance, multi-active, and disaster recovery for data access. There are currently three main different distributed database solutions. The main difference between them is the way of handling multiple copies of data and the database middleware.

1. Integrated distributed database solution

It supports multiple copies of data and high availability. The Paxos protocol is mostly used, and multiple data copies are written at one time. If most copies are successfully written, it is considered successful. Representative products are OceanBase and Gaussian database.

2. Centralized database + database middleware solution

It is a solution combining centralized database and database middleware, and realizes data routing and global data management through database middleware. The database middleware and the database are deployed independently, and the synchronization mechanism of the database itself is used to achieve the consistency of the master copy data. Centralized databases mainly include MySQL and PostgreSQL databases. Based on these two databases, many solutions have been derived, such as the open source database middleware MyCat+MySQL solution, TBase (based on PostgreSQL, but with relatively large packaging and changes) and other solutions .

3. Centralized database + sub-library class library solution

It is a lightweight database middleware solution. The sub-library class library is actually a basic JAR package, which is deployed together with the application software to implement data routing and data collection. It is suitable for relatively simple read-write transaction scenarios, and is relatively weak in terms of strong consistency and aggregation analysis queries. Typical sub-library basic components include ShardingSphere.

4. Summary

The implementation costs of the three solutions are different, and the business support capabilities are also quite different. Integrated distributed databases are mainly developed by major Internet companies and have super data processing capabilities. Most of them require cloud computing bases, and the implementation costs and technical capabilities are relatively high. The centralized database + database middleware solution has moderate implementation costs and technical capabilities, and can meet the business requirements of medium and large enterprises. The third sub-library class library solution can handle simple business scenarios, and the cost and skill requirements are relatively low. When choosing a database, we need to consider our own capabilities, costs, and business needs to choose the right solution.

2. How to design the primary key of the database sub-database?

After choosing a distributed database, the second step is to consider the data sub-database. At this time, the design of the primary key of the sub-database is very critical.

For the key business of contacting customers, I suggest that you use the customer ID as the primary key of the sub-database. This can ensure that the data of the same customer is distributed in the same data unit, avoiding frequent data access across data units. Frequent service calls across data centers or queries across data units will have a fatal impact on system performance.

Putting all the customer's data in the same data unit makes it easier for customers to provide customer consistent services. For enterprises, the "customer-centric" business capability must first be "customer-centric" in terms of data.

Of course, you can also use other business attributes as the primary key of the sub-database according to business needs, such as organizations and users.

3. Database data synchronization and replication

In a microservice architecture, data is further partitioned. In order to achieve data integration, batch data synchronization and replication between databases is essential. Data synchronization and replication are mainly used for data synchronization between databases to realize business data migration, data backup, data replication of core business data from different channels to the data platform or data center, and integration of different subject data.

Traditional data transmission methods include ETL tools and timing withdrawal programs, but data has shortcomings in terms of timeliness. The distributed architecture generally adopts the incremental data capture (CDC) technology based on the database logic log, which can realize quasi-real-time data replication and transmission, realize the decoupling of data processing and application logic, and make it easier and more convenient to use.

There are many database log capture technology components around the mainstream PostgreSQL and MySQL databases. CDC can also be used in domain event-driven design as a technology for acquiring domain event incremental data.

4. How to deal with cross-database association query?

Cross-database association query is a shortcoming of distributed databases, which will affect query performance. When modeling domains, many entities will be dispersed into different microservices, but in many cases, due to business requirements, association queries between them are required.

There are two types of business scenarios for associated queries:

  • The first type is data query based on a certain dimension or a certain subject domain, such as data query based on the customer's full business view, which will cross microservices of multiple business lines;
  • The second type is the association query between tables, such as the joint table query between the organization table and the business table, but the organization table and the business table are scattered in different microservices.

How to solve these two types of associated queries?

For the first type of scenario, because the data is scattered in different microservices, we cannot count these data across multiple microservices. You can build a topic-oriented distributed database whose data comes from microservices of different businesses. The database log capture technology is used to collect data from each business-side microservice to the subject database in quasi-real time. When data is collected, do data association in advance (such as merging multi-table data into one wide table) or establish a data model. Query microservices for topic database construction. In this way, you can obtain the business data of all dimensions of the customer in one query. You can also design appropriate sub-database primary keys according to themes or scenarios to improve query efficiency.

For the second type of scenario, for the associated query scenario between tables that are not in the same database, you can use small table broadcasting to add a redundant code sub-table in the business library. When the data in the main table changes, you can asynchronously refresh all the sub-table data through the domain event-driven mode of message publishing and subscription. This can not only solve the table-to-table association query, but also improve the data query efficiency.

5. How to deal with high-frequency hotspot data?

For high-frequency hot data, such as code data such as commodities and institutions, they are oriented to multiple applications at the same time, and must have high concurrent response capabilities. They will bring huge access pressure to the database and affect the performance of the system.

A common practice is to load these high-frequency hot data from the database into a cache such as Redis, and provide data access services through the cache. This can not only reduce the pressure on the database, but also improve the data access performance.

In addition, for high-frequency data that requires fuzzy queries, you can also use search engines such as ElasticSearch. Cache is like seasoning, with small investment, quick results, and fast improvement of user experience.

6. Processing of pre- and post-sequence business data

When designing microservices, you will often find that some data needs to be associated with the data of the previous microservice. For example, in the insurance business, after the insurance application microservice generates an insurance application form, the policy will be associated with the previous application form data, etc. In the e-commerce business, the freight delivery note will be associated with the previous order data. Since the associated data is scattered in the pre-order microservices of the business, you cannot establish data associations for them through the databases of different microservices.

How to solve this kind of entity association before and after sequence?

Generally speaking, the pre- and post-sequence data are related to domain events. You can use the domain event processing mechanism to transmit and redundant the pre-order data to the current microservice database through the domain event entity on demand.

You can design preorder data as entities or value objects, and be referenced by the current entity. When designing, you need to pay attention to the following: If the pre-order data can only be modified as a whole in the current microservice, and you will not perform queries and statistical analysis on it, you can design it as a value object; the pre-order data is multiple, and Need to do query and statistical analysis, you can design it as an entity.

In this way, you can obtain the list data of the previous order and the data of the cargo transportation order in the cargo transportation microservice at one time, and feed back all the data to the front-end application at one time, reducing the calls across microservices. If the pre-order data is designed as an entity, you can also use the pre-order data as query conditions to complete multi-dimensional comprehensive data queries in local microservices. Only when necessary, the detailed data of the pre-order entity is obtained from the pre-order microservice. In this way, the integrity of data can be guaranteed, and the dependence of microservices can be reduced, cross-microservice calls can be reduced, and system performance can be improved.

7. Data center and enterprise-level data integration

Although the distributed microservice architecture improves application flexibility and high availability, the original centralized data will form many data islands with the splitting of microservices, increasing the difficulty of data integration and enterprise-level data usage. You can use the data center to achieve data fusion and solve data application and integration problems under the distributed architecture.

You can build a data center in three steps.

  • First, complete the collection and storage of different microservices and channel business data in accordance with unified data standards, and solve the problems of data islands and primary data sharing.
  • Second, establish a thematic data model, process data according to different themes and scenarios, and establish data views for different themes, such as unified customer views, agent views, and channel views.
  • Third, establish a data system driven by business needs to support business and business model innovation.

The data center is not limited to analysis scenarios, but also applies to transactional scenarios. You can build on the data warehouse and data platform, and provide the data platform for front-end business use to provide support for transaction scenarios.

8. BFF and enterprise-level business orchestration and collaboration

Enterprise-level business processes are often completed by multiple microservices. Each single-responsibility microservice is like a building block, and they only complete their own specific functions. So how to organize these microservices to complete enterprise-level business orchestration and collaboration?

You can add a layer of BFF microservices (Backend for Frontends) between microservices and front-end applications. The main responsibility of BFF is to handle service composition and orchestration between microservices . Application services in microservices also handle service composition and orchestration. What is the difference between the two?

BFF is located on top of microservices in the middle platform, and its main responsibility is service coordination between microservices; application services mainly deal with service composition and orchestration within microservices. When designing, we should deposit reusable service capabilities to the lower layer as much as possible. While realizing capability reuse, we can also avoid cross-center service calls.

BFF is like a gear to adapt the pace between front-end applications and microservices. It adapts different front ends through Façade services, and organizes and coordinates microservices through service composition and orchestration. BFF microservices can be released in conjunction with front-end application versions according to changes in requirements and processes, avoiding frequent revisions and releases of mid-stage microservices to adapt to changes in front-end requirements, thereby ensuring the stability of the logic in the core domain of microservices.

If your BFF is strong enough, it will be a business capability platform that integrates different middle-end and micro-service capabilities and is oriented to multi-channel applications.

9. Distributed transaction or event-driven mechanism?

Under the distributed architecture, the internal calls of the original monomer will become distributed calls. If an operation involves data modification of multiple microservices, data consistency problems will arise. There are two types of data consistency: strong consistency and final consistency. They have different implementation schemes and different implementation costs.

For strong consistency business scenarios with high real-time requirements, you can use distributed transactions, but distributed transactions have performance costs. When designing, we need to balance business splitting, data consistency, performance, and implementation complexity. Avoid the generation of distributed transactions.

The asynchronous method driven by domain events is a common design method for distributed architectures, which can solve the problem of final data consistency in non-real-time scenarios. Domain event publishing and subscription based on message middleware can decouple microservices well. By shaving peaks and filling valleys, the pressure on real-time database access can be reduced, and business throughput and processing capabilities can be improved. You can also achieve read-write separation through event-driven, improving database access performance. For eventual consistency scenarios, I suggest you adopt a domain event-driven design approach.

10. Multi-center and multi-active design

The high availability of the distributed architecture is mainly realized through multi-active design. Multi-center multi-active is a very complex project. Below I mainly list the following key designs.

1. Select an appropriate distributed database.

The database should support multi-data center deployment, meet the technical requirements of multiple copies of data, underlying data replication and synchronization, and the timeliness of data recovery.

2. Unitized architecture design.

The business unit composed of several applications is used as the basic unit of deployment to realize multi-active deployment in the same city and in different places, as well as elastic expansion across centers. The business functions of each unit are self-contained, and all business processes can be completed in this unit; the data of any unit has copies in multiple data centers, and no data loss will be caused by failure; any unit failure will not affect the normal operation of other similar units. When designing a unit, we should try to avoid calls across data centers and units.

3. Access Routing.

Access routing includes routing at the access layer, application layer, and data layer to ensure that front-end access can accurately reach the data center and business units according to the routing, and accurately write or obtain the database where the business data is located.

4. Global configuration data management.

Realize the unified management of the global configuration data of each data center, and synchronize the global configuration data of each data center in real time to ensure data consistency.

Summarize

The implementation of enterprise-level distributed architecture is a very complex system engineering, involving a lot of technical systems and methods. Today I listed 10 key design areas, each of which is actually very complex and requires a lot of investment and research. When implementing, you and your company should choose appropriate technical components and implementation solutions based on your own situation.

Guess you like

Origin blog.csdn.net/zgz15515397650/article/details/132248737