Read the microservice architecture in one article

The article is updated every Saturday, you can search "haxianha" on WeChat to read it first.

Microservice framework (RPC): Spring Boot, Spring Cloud, Dubbo, gRPC, Thrift, go-micro, Motan

Service support (runtime):

  • Service registration and discovery - dynamic expansion/reduction: Zookeeper, Eureka, Consul, Etcd, Nacos
  • Service configuration - dynamic configuration: Apollo, Spring Cloud Config
  • Service Gateway-Access Control: Kong, APISIX, Zuul, Spring Cloud Gateway
  • Service Governance - circuit breaking, service degradation, current limiting: Sentinel

Service Monitoring:

  • Service Monitoring - Spotting Symptoms of Failure: Prometheus, Grafana
  • Localization Problems - Link Tracking: Zipkin, SkyWalking
  • Analysis problem - log collection: Elasticsearch, Logstash, Kibana

This article will introduce the microservice architecture and related components, explain what they are and why use microservice architecture and these components. This article focuses on succinctly expressing the overall picture of the microservice architecture, so it will not cover details such as how to use components.

To understand microservices, you must first understand those that are not microservices. The usual opposite of microservices is a monolithic application, an application that packages all functionality into a single unit. From monolithic application to microservice is not achieved overnight, it is a process of gradual evolution. This article will take an online supermarket application as an example to illustrate this process.

initial demand

A few years ago, Xiao Ming and Xiao Pi started an online supermarket together. Xiao Ming is in charge of program development, and Xiao Pi is in charge of other matters. At that time, the Internet was not yet developed, and online supermarkets were still a blue ocean. As long as the function is realized, you can make money casually. So their needs are very simple. They only need a website on the public network, and users can browse and purchase products on this website; in addition, they need a management background to manage products, users, and order data.

Let's sort out the list of functions:

  • website
    • User registration, login function
    • Product showcase
    • place an order
  • Management background
    • User Management
    • commodity management
    • order management

Due to the simple requirements, Xiao Ming made a slow motion with his left hand and right hand, and the website was ready. For security reasons, the management background does not work with the website. Xiao Ming's right hand and left hand replay in slow motion, and the management website is also done. The overall architecture diagram is as follows:

Xiao Ming waved his hand, found a cloud service to deploy it, and the website went online. After it went online, it received rave reviews and was loved by all kinds of fat houses. Xiaoming Xiaopi began to lie down and collect the money happily.

As business develops...

The good times didn't last long, and within a few days, all kinds of online supermarkets sprung up, which had a strong impact on Xiao Ming Xiaopi.

Under the pressure of competition, Xiao Ming Xiaopi decided to carry out some marketing methods:

  • Run promotions. For example, there are discounts on New Year’s Day, buy two get one free during the Spring Festival, dog food coupons on Valentine’s Day, etc.
  • Expand channels and add mobile terminal marketing. In addition to the website, mobile apps, WeChat applets, etc. need to be developed.
  • Precision marketing. Use historical data to analyze users and provide personalized services.
  • ……

These activities require the support of program development. Xiao Ming recruited his classmate Xiao Hong to join the team. Xiaohong is responsible for data analysis and mobile related development. Xiao Ming is responsible for the development of functions related to promotional activities.

Because the development task was relatively urgent, Xiaoming and Xiaohong didn't plan the structure of the whole system. They patted their heads casually and decided to put promotion management and data analysis in the management background, and build WeChat and mobile APP separately. After a few days of all-nighters, the new features and apps are almost done. At this time, the architecture diagram is as follows:

There are many unreasonable places in this stage:

  • Websites and mobile applications have a lot of duplicated code for the same business logic.
  • Data is sometimes shared through the database, and sometimes transferred through interface calls. The interface call relationship is messy.
  • In order to provide interfaces for other applications, a single application gradually changes and contains a lot of logic that does not belong to it. Application boundaries are blurred, and functional attribution is chaotic.
  • The protection level of the management background is relatively low in the initial design. After adding functions related to data analysis and promotion management, performance bottlenecks occurred, which affected other applications.
  • The database table structure is dependent on multiple applications and cannot be refactored and optimized.
  • All applications operate on one database, and the database has a performance bottleneck. Especially when the data analysis is running, the performance of the database drops sharply.
  • Development, testing, deployment, and maintenance are becoming increasingly difficult. Even if only a small function is changed, the entire application needs to be released together. Sometimes the conference accidentally brings some untested code, or after modifying a function, another unexpected place goes wrong. In order to mitigate the impact of possible problems in the release and the impact of online business suspension, all applications must be released at three or four in the morning. After the release, in order to verify the normal operation of the application, you have to watch the peak user period during the daytime the next day...
  • The team has buck-passing and wrangling. There is often a long debate about which application some public functions should be built on. In the end, they either simply do their own thing, or put it anywhere but don't maintain it.

Although there are many problems, the results of this stage cannot be denied: the system was quickly built according to business changes. However, urgent and heavy tasks tend to make people fall into partial and short-term thinking, so as to make compromise decisions. In this architecture, everyone only focuses on their own one-acre three-point land, lacking a global and long-term design. If things go on like this, system construction will become more and more difficult, and even fall into a cycle of constant overthrow and reconstruction.

time to make a change

Fortunately, Xiao Ming and Xiao Hong are good young people with pursuits and ideals. After realizing the problem, Xiao Ming and Xiao Hong freed up some energy from trivial business needs, began to sort out the overall structure, and prepared to start reforming the problem.

To do a transformation, first of all you need to have enough energy and resources. If your demand side (business people, project managers, supervisors, etc.) is so aggressively focused on the demand schedule that you can't spare extra energy and resources, then you may not be able to do anything...

In the world of programming, the most important thing is the ability to abstract. The process of microservice transformation is actually an abstract process. Xiaoming and Xiaohong sorted out the business logic of the online supermarket, abstracted the common business capabilities, and made several public services:

  • user service
  • commodity service
  • promotional service
  • order service
  • Data analysis service

Each application background only needs to obtain the required data from these services, thus deleting a lot of redundant code, leaving only a thin and light control layer and front end. The structure of this stage is as follows:

At this stage, the services are only separated, and the database is still shared, so some shortcomings of the chimney system still exist:

  1. The database becomes a performance bottleneck and risks a single point of failure.
  2. Data management tends to be chaotic. Even with a good modular design at the beginning, over time, there will always be a phenomenon that one service directly fetches data from another service from the database.
  3. The database table structure may be depended on by multiple services, which affects the whole body and is difficult to adjust.

If the mode of sharing the database is maintained, the entire architecture will become more and more rigid, losing the meaning of the microservice architecture. So Xiao Ming and Xiao Hong worked hard and split the database. All persistence layers are isolated from each other and each service is responsible for it. In addition, in order to improve the real-time performance of the system, a message queue mechanism is added. The structure is as follows:

After complete splitting, each service can adopt heterogeneous technologies. For example, the data analysis service can use the data warehouse as the persistence layer to efficiently do some statistical calculations; the access frequency of commodity services and promotional services is relatively high, so a caching mechanism is added.

Another way to abstract common logic is to make these common logic into a common framework library. This method can reduce the performance loss of service calls. However, the management cost of this method is very high, and it is difficult to ensure the consistency of all application versions.

There are also some problems and challenges in database splitting: for example, the need for cross-database cascading, the granularity of query data through services, etc. But these problems can be solved by reasonable design. Overall, database splitting is an advantage that outweighs the disadvantages.

The microservice architecture also has a non-technical benefit, which makes the division of labor and responsibilities of the entire system clearer, and each person is dedicated to providing better services for others. In the era of monolithic applications, public business functions often have no clear ownership. In the end, either they each did their own thing, and everyone re-implemented it again; or a random person (usually a person with strong ability or enthusiasm) made it into the application he was responsible for. In the latter case, in addition to being responsible for his own application, this person is also responsible for providing these public functions to others - and this function is originally no one is responsible for, just because he is more capable/enthusiastic Inexplicably blamed (this situation is also euphemistically called the capable person works too hard). As a result, no one is willing to provide public functions in the end. Over time, people in the team gradually became independent and no longer cared about the overall architecture design.

From this point of view, the use of microservice architecture also requires corresponding adjustments in the organizational structure. Therefore, microservice transformation requires the support of managers.

After the transformation was completed, Xiao Ming and Xiao Hong were clear about their respective pots. The two were very satisfied, everything was as beautiful and perfect as Maxwell's equations.

However……

no silver bullet

Spring is here, everything is recovering, and it is the annual shopping carnival again. Seeing the number of daily orders soaring, Xiaopi Xiaoming and Xiaohong smiled happily. It's a pity that the good times didn't last long, and the extreme joy gave rise to sorrow. Suddenly, the system hung up.

In the past, for monolithic applications, troubleshooting was usually done by looking at logs, studying error messages and call stacks. However, the entire application of the microservice architecture is dispersed into multiple services, and it is very difficult to locate the point of failure. Xiao Ming checks the logs machine by machine, and manually calls service by service. After more than ten minutes of searching, Xiao Ming finally located the point of failure: the promotion service stopped responding due to too many requests received. All other services directly or indirectly called the promotion service, so they also went down. In a microservices architecture, a failure of one service can create an avalanche of effects, causing the entire system to fail. In fact, before the festival, Xiao Ming and Xiao Hong had done a request evaluation. According to estimates, the server resources are sufficient to support the request volume of the festival, so there must be something wrong. However, the situation is urgent, and as every minute and every second passes, there is nothing but money, so Xiao Ming has no time to troubleshoot the problem, and immediately creates several virtual machines on the cloud, and then deploys new promotional services one by one node. After a few minutes of operation, the system finally barely returned to normal. It is estimated that hundreds of thousands of sales were lost during the entire failure time, and the hearts of the three were bleeding...

Afterwards, Xiao Ming simply wrote a log analysis tool (the volume is too large, the text editor can hardly be opened, and the naked eye can’t see it), counted the access logs of the promotional service, and found that during the failure period, the product service was due to the code Problem, in some scenarios, a large number of requests will be made to the promotion service. This problem is not complicated, and Xiao Ming fixed this bug worth hundreds of thousands with a flick of his fingers.

The problem is solved, but no one can guarantee that other similar problems will not happen again. Although the logical design of the microservice architecture is perfect, it is like a gorgeous palace built of building blocks, which cannot withstand the wind and grass. Although the microservice architecture solves old problems, it also introduces new ones:

  • The entire application of the microservice architecture is dispersed into multiple services, and it is very difficult to locate the point of failure.
  • Decreased stability. An increase in the number of services increases the probability that one of the services will fail, and a service failure may cause the entire system to hang. In fact, in production scenarios with a large number of visits, failures will always occur.
  • The number of services is very large, and the workload of deployment and management is huge.
  • In terms of development: how to ensure that each service maintains collaboration in the case of continuous development.
  • In terms of testing: After the service is split, almost all functions will involve multiple services. The original test of a single program has become a test of inter-service calls. Testing becomes more complex.

Xiaoming and Xiaohong learned from the painful experience and were determined to solve these problems. The handling of faults generally starts from two aspects. On the one hand, minimize the probability of faults, and on the other hand, reduce the impact of faults.

Monitoring - spotting symptoms of failure

In a high-concurrency distributed scenario, failures often occur suddenly in an avalanche. Therefore, it is necessary to establish a perfect monitoring system to find out the symptoms of failure as much as possible.

There are many components in the microservice architecture, and the indicators that each component needs to monitor are different. For example, Redis cache generally monitors the occupied memory value and network traffic, the database monitors the number of connections and disk space, and the business service monitors the number of concurrency, response delay, error rate, etc. Therefore, it is not realistic to build a large and comprehensive monitoring system to monitor each component, and the scalability will be poor. The general approach is to allow each component to provide an interface (metrics interface) that reports its current state, and the data format output by this interface should be consistent. Then deploy an indicator collector component, periodically obtain and maintain component status from these interfaces, and provide query services at the same time. Finally, a UI is needed to query various indicators from the indicator collector, draw the monitoring interface or send an alarm according to the threshold.

Most components do not need to be developed by yourself, and there are open source components on the Internet. Xiao Ming downloaded RedisExporter and MySQLExporter. These two components respectively provide the indicator interface of Redis cache and MySQL database. Microservices implement custom indicator interfaces based on the business logic of each service. Then Xiaoming uses Prometheus as the indicator collector, and Grafana configures the monitoring interface and email alarm. Such a set of microservice monitoring system is set up:

Locating Issues - Link Tracking

Under the microservice architecture, a user's request often involves multiple internal service calls. In order to facilitate problem location, it is necessary to be able to record how many service calls are generated inside the microservice and their call relationships when each user requests. This is called link tracking.

Let's use a link tracking example in the Istio documentation to see the effect:

Image from Istio documentation

As can be seen from the figure, this is a request for a user to visit the productpage page. During the request process, the productpage service calls the interfaces of the details and reviews services in sequence. The reviews service calls the ratings interface during the response process. The record of the whole link tracking is a tree:

To implement link tracking, each service call will record at least four items of data in HTTP HEADERS:

  • traceId: traceId identifies a calling link requested by a user. Calls with the same traceId belong to the same link.
  • spanId: ID that identifies a service call, that is, the node ID for link tracking.
  • parentId: spanId of the parent node.
  • requestTime & responseTime: request time and response time.

In addition, it is also necessary to call the components for log collection and storage, and the UI components for displaying link calls.

The above is just a minimal explanation. For the theoretical basis of link tracking, please refer to Google's Dapper

After understanding the theoretical basis, Xiao Ming chose Zipkin, an open source implementation of Dapper. Then with a flick of the finger, I wrote an interceptor for HTTP requests, which generates these data and injects them into HEADERS every time an HTTP request is made, and at the same time sends call logs asynchronously to Zipkin's log collector. Here is an additional mention that the interceptor of HTTP requests can be implemented in the code of the microservice, or it can be implemented by using a network proxy component (but in this way, each microservice needs to add a layer of proxy).

Link tracking can only locate which service has a problem, and cannot provide specific error information. The ability to find specific error information needs to be provided by the log analysis component.

Analyzing Issues - Log Analysis

Log analysis components should have been widely used before the rise of microservices. Even with a monolithic application architecture, when the number of accesses increases or the scale of the server increases, the size of the log files will expand to the point where it is difficult to access them with a text editor, and what's worse, they are scattered across multiple servers. To troubleshoot a problem, you need to log in to each server to obtain log files, and search for the desired log information one by one (and opening and searching are very slow).

Therefore, when the application scale becomes larger, we need a "search engine" for logs. In order to be able to accurately find the desired log. In addition, the data source side also needs a component for collecting logs and a UI component for displaying results:

Xiao Ming investigated and used the famous ELK log analysis component. ELK is an acronym for the three components of Elasticsearch, Logstash and Kibana.

  • Elasticsearch: search engine, but also log storage.
  • Logstash: Log collector, which receives log input, performs some preprocessing on the log, and then outputs it to Elasticsearch.
  • Kibana: UI component, finds data through Elasticsearch API and displays it to users.

One last small question is how to send the logs to Logstash. One solution is to directly call the Logstash interface to send the logs when the logs are output. In this way, it is necessary to modify the code again (hey, why use "again")... So Xiao Ming chose another solution: the log is still output to the file, and an Agent is deployed in each service to scan the log file and output it to Logstash .

Gateway (Service Governance) - Authority Control

After splitting into microservices, a large number of services and interfaces appear, making the entire calling relationship messy. Often during the development process, while writing, I suddenly can't remember which service should be called for a certain data. Or the writing is crooked, and the service that should not be called is called. Originally, a read-only function modifies the data...

In order to deal with these situations, the invocation of microservices requires a gatekeeper, that is, a gateway. Add a layer of gateway between the caller and the callee, and perform permission verification every time it is called. In addition, the gateway can also serve as a platform for providing service interface documents.

One problem with using a gateway is to decide what granularity to use: the coarsest-grained solution is a gateway for the entire microservice, the microservice accesses the microservice through the gateway, and calls it directly inside the microservice; the finest-grained solution is all calls, Whether it is an internal call of a microservice or an external call, it must pass through the gateway. The compromise solution is to divide the microservices into several areas according to the business domain, and call them directly in the area, and call them through the gateway in the interval.

Since the number of services in the entire online supermarket is not particularly large, Xiao Ming adopts the most coarse-grained solution:

Service registration and discovery - dynamic expansion

The preceding components are all designed to reduce the possibility of failure. However, failures will always occur, so another study that needs to be done is how to reduce the impact of failures.

The crudest (and most commonly used) failure-handling strategy is redundancy. Generally speaking, a service will deploy multiple instances, so that the pressure can be shared to improve performance, and secondly, even if one instance is hung up, other instances can still respond.

One issue with redundancy is how much redundancy to use? There is no definite answer to this question on the timeline. Depending on the service function and time period, different numbers of instances are required. For example, on weekdays, 4 instances may be enough; but during promotional activities, traffic increases, and 40 instances may be needed. Therefore, the number of redundancy is not a fixed value, but adjusted in real time according to needs.

Generally speaking, the operation of adding an instance is as follows:

  1. Deploy new instance
  2. Register the new instance with the load balancer or DNS

There are only two steps in the operation, but if the operation of registering to the load balancer or DNS is a manual operation, then things will not be simple. Think about the feeling of manually entering 40 IPs after adding 40 instances...

The solution to this problem is automatic service registration and discovery. First, a service discovery service needs to be deployed, which provides a service with address information for all registered services. DNS is also a service discovery service. Then each application service automatically registers itself with the service discovery service when it starts. And after the application service starts, it will synchronize the address list of each application service from the service discovery service to the local in real time (regularly). The service discovery service also regularly checks the health status of application services and removes unhealthy instance addresses. In this way, when adding an instance, you only need to deploy a new instance, and when the instance goes offline, you can directly shut down the service. Service discovery will automatically check the increase or decrease of the service instance.

Service discovery is also used in conjunction with client-side load balancing. Since the application service has already synchronized the service address list locally, when accessing microservices, you can decide the load strategy yourself. It is even possible to add some metadata (service version and other information) when the service is registered, and the client load will perform traffic control based on these metadata to realize A/B testing, blue-green publishing and other functions.

There are many components to choose from for service discovery, such as Zookeeper, Eureka, Consul, Etcd, etc. However, Xiao Ming felt that his level was good and wanted to show off his skills, so he wrote one based on Redis...

Circuit breaking, service degradation, current limiting

fuse

When a service stops responding for various reasons, the caller usually waits for a while, then times out or receives an error return. If the call link is relatively long, it may cause requests to accumulate, and the entire link takes up a lot of resources and has been waiting for downstream responses. Therefore, when multiple accesses to a service fail, it should be broken, mark that the service has stopped working, and return an error directly. Wait until the service returns to normal before re-establishing the connection.

The picture comes from "Microservice Design"

service downgrade

When the downstream service stops working, if the service is not the core business, the upstream service should be downgraded to ensure that the core business is not interrupted. For example, the ordering interface of an online supermarket has a function of recommending products to make an order. When the recommendation module is suspended, the ordering function cannot be suspended at the same time. You only need to temporarily disable the recommendation function.

Limiting

After a service hangs up, the upstream service or user will usually retry the access habitually. As a result, once the service returns to normal, it is likely to hang up immediately due to the instantaneous excessive network traffic, repeating sit-ups in the coffin. Therefore, the service needs to be able to protect itself - throttling. There are many current limiting strategies. The simplest one is to discard excess requests when there are too many requests per unit time. In addition, partition current limiting can also be considered. Only reject requests from services that generate a large number of requests. For example, both the product service and the order service need to access the promotion service. The product service initiates a large number of requests due to code problems, and the promotion service only restricts requests from the product service, and the requests from the order service respond normally.

test

Under the microservice architecture, testing is divided into three levels:

  • End-to-end testing: Covers the entire system, usually on user interface models.
  • Service Test: Test the service interface.
  • Unit testing: Testing against a unit of code.

The ease of implementation of the three tests from top to bottom increases, but the test effect decreases. The end-to-end test is the most time-consuming and laborious, but we have the most confidence in the system after passing the test. Unit testing is the easiest to implement and has the highest efficiency, but it cannot guarantee that the entire system is free from problems after testing.

Due to the difficulty in implementing end-to-end testing, generally only core functions are tested end-to-end. Once an end-to-end test fails, it needs to be decomposed into unit tests: analyze the cause of failure, and then write unit tests to reproduce the problem, so that in the future we can catch the same bug faster.

The difficulty with service testing is that services often depend on some other service. This problem can be solved by Mock Server:

Everyone is familiar with unit testing. We generally write a large number of unit tests (including regression tests) to try to cover all the code.

Microservice Framework

Indicator interface, link tracking injection, log drainage, service registration discovery, routing rules and other components, as well as functions such as circuit breaking and current limiting, all need to add some docking code to the application service. It is very time-consuming and labor-intensive to let each application service implement itself. Based on the principle of DRY, Xiaoming has developed a set of microservice framework, which extracts the code connected with each component and other common codes into the framework, and all application services are developed using this framework uniformly.

Many custom functions can be implemented using the microservice framework. Even program call stack information can be injected into the link trace to realize code-level link trace. Or output the status information of the thread pool and connection pool, and monitor the underlying status of the service in real time.

There is a serious problem with using a unified microservice framework: the cost of updating the framework is very high. Every time the framework is upgraded, all application services need to be upgraded accordingly. Of course, a compatible solution is generally used to allow a period of parallel time to wait for all application services to be upgraded. However, if there are many application services, the upgrade time may be very long. And there are some very stable application services that are hardly updated, and the person in charge may refuse to upgrade... Therefore, using a unified microservice framework requires a complete version management method and development management specifications.

Another way - Service Mesh

Another way to abstract common code is to abstract that code directly into a reverse proxy component. Each service additionally deploys this proxy component, and all inbound and outbound traffic is processed and forwarded through this component. This component is called Sidecar.

Sidecar does not incur additional network costs. Sidecar and microservice nodes will be deployed on the same host and share the same virtual network card. Therefore, the communication between sidecar and microservice nodes is actually only realized through memory copy.

Image from: Pattern: Service Mesh

Sidecar is only responsible for network communication. A component is also needed to uniformly manage the configuration of all sidecars. In Service Mesh, the part responsible for network communication is called the data plane, and the part responsible for configuration management is called the control plane. The data plane and control plane constitute the basic architecture of Service Mesh.

Image from: Pattern: Service Mesh

Compared with the microservice framework, the advantage of Service Mesh is that it does not invade the code, and it is more convenient to upgrade and maintain. It is often criticized for performance issues. Even though the loopback network does not generate actual network requests, there is still an additional cost of memory copying. In addition, some centralized traffic processing will also affect performance.

the end is also the beginning

Microservices are not the end of architectural evolution. Going further, there are Serverless, FaaS and other directions. On the other hand, there are also people who are singing that the long-term must be divided and the long-term must be united, rediscovering the single structure...

In any case, the transformation of the microservice architecture has come to an end for the time being. Xiao Ming patted his increasingly smooth head contentedly, planning to take a break this weekend and invite Xiao Hong to have a cup of coffee.


The article is updated every Saturday, you can search "haxianha" on WeChat to read it first.

Recently, "haxianhe's technology circle" was opened on Knowledge Planet, where I will chat with friends about technology and grow together. Interested friends can scan the code to enter.

In addition, I will invite friends from top-tier manufacturers to come in as guests. If you want to change jobs or want to know about the business development of each major factory, you can join in. (The planet has been adjusted to the lowest price by me for the time being, and it may be adjusted when there are more people in the follow-up)

Guess you like

Origin blog.csdn.net/finish_dream/article/details/116266211