Microservices High Availability Architecture Based on Payment Scenarios

The sharing brought to you today is a microservice actual combat based on the payment scenario, which will be more inclined to the content of the application layer.

Share the outline:

1. SOA and Microservices

2. Some challenges encountered by the old payment architecture

3. Some transformations based on how to do microservices

4. Some things you plan to do in the future

1. SOA and Microservices

In my opinion, although microservices are technologies introduced from abroad, they are linked to some of our theories in China. So before entering the topic, let me briefly introduce the wheat field theory.

1. About the wheat field theory

During the ancient Zhou Dynasty, the common people did not have any plans for farming, nor did they have any regional restrictions. Generally speaking, rice, wheat, and vegetables were planted in the fields for a while. It was found that the degree of sunlight exposure to the crops was very low, the nutrition was very unbalanced, and the later maintenance costs were very high. Until the Warring States Period, an agricultural expert divided the land into multiple areas, and each area planted a type of crop, which was separated from the ground, forming the initial microservice concept.

Many articles we have seen in the past only talked about the comparison between SOA and microservices, and I added a DDD on this basis today. The following is an introduction to the evolution process of DDD, SOA and microservices.

2. DDD, SOA and Microservices

  • SOA Architecture

SOA is a product of the previous era, which appeared before 2010. When it was first proposed, it was a solution for the traditional industry computing field. At that time, Oracle and IBM also proposed many solutions, including many process engines.

Its idea is to divide tightly coupled systems into business-oriented coarse-grained, loosely coupled, stateless services. After that, the proposer of microservices made an improvement based on SOA, turning it into a single responsibility, independent deployment, small microservice, which is an opposite concept.

  • Microservices and DDD

Today, when we talk about microservices, we will think of DDD. Many friends think that DDD is born for microservices. In fact, this is not the case. When I came into contact with DDD, it was first used for UML design and domain modeling.

DDD pays attention to the hyperemia model, while the J2EE model is bundled with the traditional layered architecture and the Spring architecture to form an architecture model dominated by the anemia model. Have a deep understanding of the business, otherwise there will be confusion in later projects.

In addition, the DDD thinking is relatively broad, resulting in the formation of a hundred schools of thought contending attitude, without forming a set of fixed methodology. It is not easy for developers to understand, so fewer people pay attention to DDD later, and the proposal of microservices cleverly draws on keywords such as bounded context, subdomains, and domain events in DDD, and microservices are more and more recognized by the industry The case also brought a new glow to DDD.

2. Challenges encountered by the old payment architecture

1. Two perspectives for judging the quality of a project

We judge the quality of an excellent project in terms of excellent code and high-availability architecture. While designing a high-availability architecture, we can't ignore the importance of code. Excellent code refers to redundancy, other operations, concurrency, deadlock, etc. It does not necessarily mean that the code is beautifully written.

This is like building a building. The foundation of the building is well built, but the workers who build the building are not professional enough, and there are many points that need to be paid attention to. If there is a problem when filling in bricks and tiles, the consequence will be the house. There are frequent rain leaks, cracks in the walls and other problems. Although the building will not collapse, the building has also become a dangerous building.

From a code and design perspective there are:

  • The project is not scalable due to unreasonable code

  • Database often deadlocks

  • Indiscriminate use of database transactions, causing transactions to take too long

  • The code has poor fault tolerance and often causes accidents due to insufficient consideration

  • A lot of useless logs are printed in the program and cause performance problems

  • Common configuration information is still read from the database

  • Abusing the thread pool, causing stack and heap overflows

  • Query data from the library and find all of them each time

  • Business code development does not consider other operations

  • The use of cache is unreasonable, and there are situations such as shocking herd effect, cache penetration, etc.

  • The upstream and downstream processes of the code are confusingly defined

  • Exception handling mechanism is confusing

From the perspective of the overall architecture:

  • The whole still uses a single cluster architecture

  • Single-room server deployment

  • Using Nginx+hessian to realize service

  • Incomplete division of business architecture and blurred boundaries

  • The project split is not complete, one Tomcat shares multiple applications

  • Trouble-Free Degradation Policy

  • Unreasonable monitoring system (network, system)

  • Payment operation report, large data volume query

  • Operation and maintenance manual packaging, manual online

  • System expansion manual deployment

Based on the above two points, we can clearly see the existing problems of the old projects and start to think about how the new microservice architecture should be done.

3. How to do transformation based on microservices

To make a highly available microservice architecture, you must first establish the following five points:

  • One is  the product iteration speed  . When the architecture is designed, it must be beneficial to the product. After the design cannot be completed, the development will be slower than before. This is also the core of microservices. By dismantling the business, different products are productized one by one instead of project.

  • The second is  system stability  . When a single system reports an error, all errors will be reported. Now it is more granular, and at the same time, various monitoring systems are implemented.

  • The third is  system stability  .

  • Fourth, the  problem is quickly located  , that is, it gives us dozens of minutes of failure time every year.

  • The fifth is the  degree of system coupling  . Do not put all the systems together, but disassemble them more.

1. Use DDD to divide bounded contexts

This is a business architecture diagram based on some business scenarios. The green part in the middle is the product service layer. Using the idea of ​​DDD to analyze, the product service layer is also the product service domain. This domain contains three subdomains, one is the cashier subdomain, the other is the merchant subdomain, and the other is the personal subdomain. Each domain contains bounded contexts, two for cashiers, four for merchants, and two for individuals.

Some students may not understand the concept of bounded context, and can understand it as a system, a boundary or an entity. For example, we have to take the subway three times a day to go to work. What is the key event here? It is to go to work, and the bounded context is to take the subway and switch three times in the middle.

The bounded context can be understood as a microservice, or it can be understood as a system or a module.

The division of bounded contexts can be determined according to the size of our team. If the size of the team does not reach a certain level, the bounds can be set thicker. If the scale of the project and the team continue to expand, the larger areas and bounded contexts can be continued. Split into multiple smaller ones.

2. Microservice governance architecture diagram

This is our general flow chart of a microservice, using the Spring Boot+Dubbo architecture.

Share an important knowledge point of high availability/distribution/high performance
Practice a high-concurrency turntable lottery
Build a non-penetrating business system monitoring platform
Netty+websocket to achieve timely communication
Write a database dynamic expansion plan and MyCat practice
SOA architecture and micro The principle of service architecture
redis/zookeeper/kafka principle
Netty communication and asynchronous technology analysis
Based on Spring Cloud microservices actual combat
database dynamic expansion plan actual combat drill
Local queue and distributed queue technology analysis

I will share this knowledge in my QQ group: 561614305 I have also recorded some videos, and I will give them to everyone for free.

Why advocate for Dubbo instead of Spring Cloud? There are several reasons:

  • One is to see what the current architecture is. If it is Dubbo, some facilities of many components must be built around it. If it is completely overturned and replaced with another architecture at this time, we need to carefully consider the cost part.

  • Second, although Spring Cloud technology is new, it is not necessarily easier to use than Dubbo. At present, the company maintains a Dubbo Cloud, and develops a Dubbo-based microservice system by itself.

The probe in the middle of the above picture is also independently developed by us, which can collect all kinds of information of the entire service link, such as the time of network disconnection, error reporting, return value and parameters, and use this information after the collection is completed. An open source component is transformed, pushes all the information to it, and displays what we want through that interface.

For the latter set, such as Hystrix fuse, Dubbo Admin and Mock Server, we refer to Dubbo's ideas for intelligent interception and service degradation. The following service registration, service discovery, service routing, failure retry and service monitoring are provided by Dubbo itself, that is, the green part of the picture is a new function developed by us. In the future, we will open source this Dubbo Cloud system.

3. Evolution of the channel alarm switching system

This is the framework evolution of our channel alarm system. Why is it called a "channel"? Because our payment needs to go to the bank, but the banks themselves are relatively traditional, their channels are not very stable, there are often various problems, and each bank has N channels, we can't know which channel has recently Whether it is stable or unstable, it will change back and forth.

Here we have developed an Agent by ourselves, and collected some usage data in its channel. For example, this time we have successfully connected, obtained the data, and put it in Kafka. After that, there is a statistical analysis thing. If the channel is connected If it succeeds, it will be counted once, and finally the results at regular intervals will be stored in the Redis cluster.

The routing system in the figure is for channel selection. This is a business system. Each time the routing system selects a channel, it first takes out the bank's channel from the Redis cluster and selects the one with the highest score.

After taking it out, do a cleaning or selection through our own set of routing election support, and finally get an optimal channel, which is directly connected to the bank channel, so that we can know which channels are highly available.

The underlying process is still through Kafka, and various statistical analysis is performed, which is also very easy to use. Then there will be a chart here. If there is a current problem, you can see it on the interface, and you can send text messages and emails to you at the same time.

Why do we do two sets here? In the first phase, we need to do a data comparison, because if the collected data is inaccurate, there will be problems with this channel. After collecting, put it in a library, do it again through this library, and finally make a comparison each time to count the correct rate.

When there is also a problem with this channel, such as a problem with a certain bank channel, our monitoring system will directly set the bank channel to be unavailable, and then notify the R&D department to let them solve it, and then make this channel available after the completion. . If the process data is inaccurate, it will cause frequent channel switching and many unnecessary problems, so in the first phase, we will make it semi-automatic.

4. The evolution of the active-active architecture

The evolution of the active-active computer room also requires two stages.

Among them,  the first stage is pseudo-active active:

  1. The two computer rooms provide services at the same time, but the main and standby computer rooms need to be set up; applications in the standby computer room can only access the database in the main computer room through a dedicated line; Redis in the standby computer room also needs to access Redis in the main computer room through a dedicated line.

  2. When the host room is hung up, you need to first change the database configuration applied in the standby room to the standby database, and at the same time, the standby database will be shut down, and the standby database must be modified to be the main database.

The second stage is the active-active swimming lane:

In the evolution of the active-active architecture, when ZK performs data synchronization, two methods are used by Curator's TreeCacheListener to monitor the changes of the corresponding nodes to synchronize data, and the other is to modify the ZK source code to pretend to be an Observer to receive transaction log data to achieve data synchronization.

The synchronization of ZK is best without synchronization and isolation of swimming lanes. For example, when using Dubbo, you can completely synchronize the two environments. If you use Dangdang Elastic-Job, it will be relatively troublesome when doing active and active.

5. Panorama of Microservice Architecture

This is the overall architecture of our entire microservice. The left half of the figure shows how to divide services, what fields are divided, and what services are there, how the database content is divided, and how the gateway layer does it, which is a division from a business perspective.

The piece on the right shows how we ensure the reliability of our microservices. The first layer is mainly used by project operators, and the second layer is what we have done to ensure microservices. There are unified scheduling center, active-active management and control architecture, as well as big data platform and distributed cache. Various components are used to ensure the smooth development of microservices.

Next is some monitoring. Here we use the monitoring of APM distributed chain adjustment, including some monitoring platforms that we have made ourselves.

I will share this knowledge in my QQ group: 561614305 I have also recorded some videos, and I will give them to everyone for free.

6. Continuous Integration Testing

Next, let’s talk about our continuous testing experience. How to ensure the quality of the code? Here comes the concept of integration testing.

We divided into four quadrants, one is unit testing, which is done by the developers themselves, and the coverage rate is generally 60-80%; the other is acceptance testing and exploratory testing, these two are actually our tests Personnel are doing, one is to verify the feasibility of the business, the other is to take some illegal conditions or some destructive tests, and finally the stress test, through the stress test to see the load that the system can withstand.

This is our entire testing process. First of all, we refer to some coding standards of Ali and other companies, formulate a set of our own coding standards, and reach a consensus with all developers. Then we have our own static code inspection. Ali's components can also be used here. These are the first two steps.

The third step is unit testing. Basically, the first three parts are developed to ensure the robustness and correctness of the code. The fourth is continuous integration. We scan the code again according to our own rules and templates. After the scan, we organize some architects or technical experts to refactor some key core codes, which is roughly There are five steps.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324962004&siteId=291194637
Recommended