[Transfer] What basic frameworks do we need to implement microservices?

What basic frameworks do we need to implement microservices?

Posted by  Bo Yang  on December 1, 2015 |

The MicroServices architecture is a technology hotspot in the current Internet industry. Many colleagues in the circle currently plan to build a microservices-based system in their respective companies. They all have the same questions: What are the technical concerns of a microservices architecture? points (technical concerns)? What infrastructure frameworks or components are required to support a microservices architecture? How should these frameworks or components be selected? The author has participated in and led the construction of large-scale service-oriented systems and frameworks in two large Internet companies before. At the same time, I have also invested a lot of time in this area to study and research. I have some experience and learning experience, which can be shared with you.

Service registration, discovery, load balancing and health checks

Different from the Monolithic architecture, the microservice architecture is a distributed mesh structure composed of a series of fine-grained services with a single responsibility. The services communicate through a lightweight mechanism. At this time, a service registration discovery problem must be introduced. That is to say, the service provider needs to register and announce the service address, and the service caller needs to be able to discover the target service. At the same time, the service provider generally provides services in a cluster mode, which introduces load balancing and health check problems. According to the location of the load balancing LB, there are currently three main service registration, discovery and load balancing solutions:

The first is a centralized LB solution, as shown in Fig. 1 below. There is an independent LB between service consumers and service providers. LB is usually a specialized hardware device such as F5, or implemented based on software such as LVS, HAproxy, etc. The LB has the address mapping table of all services, which is usually registered by the operation and maintenance configuration. When the service consumer calls a target service, it initiates a request to the LB, and the LB performs load balancing with a certain strategy (such as Round-Robin). Forward the request to the target service. LBs generally have health check capabilities and can automatically remove unhealthy service instances. How does the service consumer discover the LB? The usual practice is to use DNS, and the operation and maintenance personnel configure a DNS domain name for the service, and this domain name points to the LB.

 

Fig 1, Centralized LB solution

 

The centralized LB solution is simple to implement, and it is easy to perform centralized access control on the LB. This solution is still the mainstream in the industry. The main problem of centralized LB is a single point problem. All service call traffic passes through the LB. When the number of services and calls are large, the LB can easily become a bottleneck, and once the LB fails, the impact on the entire system is catastrophic. In addition, LB adds a hop between the service consumer and the service provider, which has a certain performance overhead.

The second is the in-process LB solution. In view of the shortcomings of the centralized LB, the in-process LB solution integrates the functions of the LB into the service consumer process in the form of a library. This solution is also called Soft Load Balancing or Client load scheme, Fig 2 below shows how this scheme works. This solution requires a service registry (Service Registry) to support service self-registration and self-discovery. When the service provider starts, it first registers the service address in the service registry (and periodically reports the heartbeat to the service registry to indicate the survival of the service). status, equivalent to health check), when a service consumer wants to access a service, it queries the service registry through the built-in LB component (and caches and periodically refreshes) the target service address list, and then selects one with a certain load balancing strategy The target service address, and finally initiate a request to the target service. This solution has high requirements on the availability of the service registry, and is generally implemented with components that can meet high availability and distributed consistency (such as Zookeeper, Consul, Etcd, etc.).

Fig 2, In-process LB scheme

The in-process LB solution is a distributed solution. The LB and service discovery capabilities are distributed within the process of each service consumer. At the same time, the service consumer and the service provider are called directly, with no additional overhead and better performance. However, this solution is integrated into the service caller process in the form of a client library. If there are many different language stacks in the enterprise, it is necessary to develop a variety of different clients together, which has a certain cost of R&D and maintenance. . In addition, once the client follows the service caller and releases it to the production environment, if the client library is to be upgraded in the future, the service caller must be required to modify the code and republish it, so there is considerable resistance to the upgrade and promotion of this solution.

The case for in-process LB isNetflix's open source service framework, the corresponding components are: Eureka service registry, Karyon server framework supports service self-registration and health check, Ribbon client framework supports service self-discovery and soft routing. In addition, Ali's open source service framework Dubbo also adopts a similar mechanism.

The third is the host-independent LB process scheme. This scheme is a compromise scheme proposed for the shortcomings of the second scheme. The principle is basically similar to the second scheme. The difference is that it combines LB and service discovery functions. Move out of the process and become an independent process on the host. When one or more services on the host want to access the target service, they all use the independent LB process on the same host for service discovery and load balancing, as shown in the figure below Fig. 3.

Fig 3 Host independent LB process scheme

This solution is also a distributed solution, there is no single point of problem, one LB process hangs only affects the service caller on the host, the service caller and the LB are in-process calls, and the performance is good. At the same time, this solution also simplifies The service caller does not need to develop client libraries for different languages, and the LB upgrade does not require the service caller to change the code. The disadvantage of this solution is that the deployment is complicated, there are many links, and it is inconvenient to debug and troubleshoot problems.

A typical case of this solution is Airbnb's SmartStack service discovery framework. The corresponding components are: Zookeeper as the service registry, Nerve independent process responsible for service registration and health check, and Synapse/HAproxy independent process responsible for service discovery and load balancing. Google's latest container-based PaaS platform, Kubernetes, uses a similar mechanism for internal service discovery.

Service front-end routing

In addition to calling and communicating with each other internally, microservices must eventually be exposed in some way so that external systems (such as customers' browsers, mobile devices, etc.) can access them. This involves the front-end routing of services, corresponding to The component is the service gateway (Service Gateway), as shown in Fig. 4, the gateway is a door connecting the internal and external systems of the enterprise, and has the following key functions:

  1. Service reverse routing, the gateway is responsible for reverse routing external requests to internal specific microservices, so that although the enterprise is a complex distributed microservice structure, the external system sees from the gateway as a unified complete service , the gateway shields the complexity of background services, and also shields the upgrades and changes of background services.
  2. Security authentication and anti-crawling, all external requests must go through the gateway, the gateway can centrally control access security, such as user authentication and authorization, and can also analyze the access mode to achieve anti-crawling function, the gateway is the security door connecting the internal and external systems of the enterprise.
  3. Current limiting and fault tolerance. During the peak traffic period, the gateway can limit the traffic to protect the background system from being overwhelmed by large traffic. When the internal system fails, the gateway can focus on fault tolerance and maintain a good external user experience.
  4. Monitoring, the gateway can centrally monitor the access volume, call delay, error count and access mode, and provide data support for back-end performance optimization or capacity expansion.
  5. Logs, the gateway can collect all access logs and enter the background system for further analysis.

Fig 4, Service Gateway

In addition to the above basic capabilities, the gateway can also implement online traffic drainage, online stress testing, online debugging (Surgical debugging), canary testing (Canary Testing), data center active-active (Active-Active HA) and other advanced functions.

The gateway usually works at layer 7 and has certain computing logic. It is generally deployed in a cluster, and the front-end LB performs load balancing.

The open source gateway components include Netflix's Zuul, which is characterized by a dynamic hot-deployable filter mechanism. Others such as HAproxy and Nginx can be extended to be used as gateways.

After introducing components such as service registry and gateway, we can more intuitively show the service registration discovery and routing mechanism in the entire microservice system through a simplified microservice architecture diagram (Fig 5). LB service discovery and load balancing mechanism. In the microservice architecture shown in Fig 5 below, the service is simplified into two layers, the back-end general service (also called the Middle Tier Service) and the front-end service (also called the Edge Service, the front-end service is used for the back-end service). After doing the necessary aggregation and cropping, it is exposed to different external devices, such as PC, Pad or Phone). When the back-end service starts, the address information will be registered in the service registry, and the front-end service can find and call the back-end service by querying the service registry; when the front-end service is started, the address information will also be registered in the service registry, so that the gateway can query The service registry can route requests to the target front-end service, so that the service self-registration, self-discovery and soft routing of the entire microservice system are connected in series through the service registry and the gateway. From the perspective of object-oriented design patterns, the gateway is similar to the Proxy proxy or Façade facade pattern, while the service registry and service self-registration and self-discovery are similar to the IoC dependency injection pattern. Microservices can be understood as those based on gateway proxy and registry IoC. Distributed Systems.

Fig 5, Simplified Microservice Architecture Diagram

Service fault tolerance

When enterprises are microserviced, there will be intricate dependencies between services. For example, a front-end request generally depends on multiple back-end services, which is technically called 1 -> N fan-out (see Figure 6). In actual production environments, services are often not 100% reliable, and services may fail or cause delays. If an application cannot tolerate and isolate faults on which it depends, the application itself is at risk of being dragged down. In a high-traffic website, once a single backend is delayed, all application resources (threads, queues, etc.) may be exhausted within seconds, resulting in the so-called avalanche effect (Cascading Failure, see Figure 7), In severe cases, the entire website can be paralyzed.

Fig 6, Service dependencies

Fig 7, Avalanche effect caused by single service delay during peak period

After years of exploration and practice, the industry has explored a set of effective fault tolerance modes and best practices in distributed service fault tolerance, including:

    1. Circuit breaker pattern (Circuit Breaker Patten), the principle of this pattern is similar to the circuit fuse in the home, if the circuit in the home is short-circuited, the fuse can actively fuse the circuit to avoid catastrophic loss. After applying the circuit fuse mode in a distributed system, when the target service is slow or has a large number of timeouts, the caller can actively fuse to prevent the service from being further dragged down; if the situation improves again, the circuit can automatically recover, which is called Elastic fault tolerance, the system has self-recovery ability. Figure 8 below is a typical state diagram of a circuit protector with elastic recovery capability. In normal state, the circuit is in the closed state (Closed), if the call continues to fail or times out, the circuit is opened and enters the blown state (Open), and the following paragraphs All calls within the time will be rejected (Fail Fast), after a period of time, the protector will try to enter the half-open state (Half-Open), allowing a small number of requests to come in and try, if the call still fails, it will return to the fuse state, if the call If successful, return to the circuit closed state.

Fig 8, state diagram of elastic circuit protection

  1. Bulkhead Isolation Pattern, as the name suggests, isolates resources or failed units like a bulkhead. If one cabin breaks into water, only one cabin is lost, and other cabins can be unaffected. Thread Isolation is an example of the bulkhead isolation mode. Suppose an application A calls three services Svc1/Svc2/Svc3, and the container where A is deployed has a total of 120 worker threads. Each call to Svc1/Svc2/Svc3 allocates 40 threads. When Svc2 is slow, the 40 threads allocated to Svc2 are blocked due to the slowness and eventually exhausted. Thread isolation can ensure that the 80 threads allocated to Svc1/Svc3 can not be used. Affected, if there is no such isolation mechanism, when Svc2 is slow, all 120 worker threads will be quickly eaten up by calls to Svc2, and the entire application will be slowed down.
  2. Rate Limiting/Load Shedder: Services always have capacity limitations. Services without a rate limiting mechanism are easily overwhelmed by burst traffic (Seckill, Double Eleven). Current limit usually refers to limiting the concurrent access to the service. For example, only 100 concurrent calls are allowed per unit time. Requests that exceed this limit should be rejected and rolled back.
  3. Fallback, what is the subsequent processing logic of the application when a circuit breaker or current limit occurs? Fallback is the elastic recovery capability of the system. Common processing strategies include throwing an exception directly, also known as Fail Fast. It can also return a null value or default value, and it can also return backup data. If the main service is broken , data can be obtained from a backup service.

Netflix integrates the above fault tolerance modes and best practices into an open source component called Hystrix. For any dependency point (service, cache, database access, etc.) that requires fault tolerance, developers only need to encapsulate the call in Hystrix Command, then the relevant The call is automatically placed under the elastic fault tolerance protection of Hystrix. The Hystrix component has been verified by Netflix for many years of operation and maintenance. It is the cornerstone of the stability and elasticity of the Netflix microservice platform and is gradually being accepted by the community as a standard fault-tolerant component.

service framework

After microserviceization, in order to allow business developers to focus on business logic implementation, avoid redundancy and duplication of work, and standardize R&D to improve efficiency, some public concerns must be pushed to the framework level. The service framework (Fig 9) mainly encapsulates the logic of common concerns, including:

Fig 9, Service Framework

  1. Service registration, discovery, load balancing and health check. Assuming that the in-process LB scheme is adopted, then the service self-registration is generally done in the server-side framework. The health check logic is customized by the specific business service. The framework layer provides a mechanism to call the health check logic. Service discovery and load balancing are integrated in the service client framework.
  2. To monitor logs, on the one hand, the framework needs to record important framework layer logs, metrics, and call chain data, and also expose interfaces such as logs and metrics, so that the business layer can record business log data as needed. In the operating environment, all log data is generally centralized in the backend log system of the enterprise for further analysis and processing.
  3. REST/RPC and serialization, the framework layer should support the exposure of business logic in HTTP/REST or RPC mode. HTTP/REST is the current mainstream API exposure mode, and Binary/RPC mode can be used in occasions with high performance requirements. For the current diversified device types (browser, ordinary PC, wireless device, etc.), the framework layer should support a customizable serialization mechanism. For example, for browsers, the framework supports the output of Ajax-friendly JSON message format, while for wireless devices On the Native App, the framework supports the Binary message format with high output performance.
  4. Configuration, in addition to supporting common configuration file configuration, the framework layer can also integrate dynamic runtime configuration, which can dynamically adjust the parameters and configuration of services for different environments at runtime.
  5. Current limiting and fault tolerance, the framework integrates current limiting and fault tolerance components, which can automatically limit current and fault tolerance at runtime, protect services, and if further combined with dynamic configuration, it can also achieve dynamic current limiting and fusing.
  6. Management interface, framework integration management interface, on the one hand, you can view the internal status of the framework and services online, and at the same time, you can dynamically adjust the internal status, providing quick feedback for debugging, monitoring and management. The Actuator module of the Spring Boot micro-framework is a powerful management interface.
  7. Unified error handling. For the internal exceptions of the framework layer and services, if the framework layer can uniformly handle and record logs, it is very helpful for service monitoring and rapid problem location.
  8. Security, security and access control logic can be encapsulated uniformly in the framework layer, and can be made into plug-in form, and specific business services can load relevant security plug-ins as needed.
  9. Automatic document generation, document writing and synchronization has always been a pain point. If the framework layer can support the automatic generation and synchronization of documents, it will bring great convenience to developers and testers who use the API. Swagger is a documentation scheme for popular Restful APIs.

The current industry is more matureMicroservice frameworks include Netflix's Karyon/Ribbon, Spring's Spring Boot/Cloud, and Ali's Dubbo.

runtime configuration management

Services generally have many dependent configurations, such as connection string configuration, connection pool size and connection timeout configuration for accessing the database. These configurations are generally different in different environments (development/test/production). For example, the production environment needs to be equipped with a connection pool, while the development test The environment may not match, and some parameter configurations may be dynamically adjusted during operation. For example, the current limiting and fusing thresholds are dynamically adjusted according to the flow conditions during operation. At present, the common practice is to build a runtime configuration center to support the dynamic configuration of microservices. The simplified architecture is as follows (Fig 10):

Fig 10, Service Configuration Center

The dynamic configuration is stored on the centralized configuration server. Users configure and adjust the service configuration through the management interface. The specific service updates the dynamic configuration through the method of scheduled pull (Scheduled Pull) or server-side push (Server-side Push). The pull method is more reliable , but there will be delay and invalid network overhead (assuming that the configuration is not updated frequently). The server push method can update the configuration in time, but the implementation is more complicated. Generally, a long connection needs to be established between the service and the configuration server. The configuration center also needs to solve the problem of configuration version control and auditing. For a large-scale service environment, the configuration center also needs to consider the issues of distribution and high availability.

The more mature open source solutions in the configuration center include Baidu's Disconf, 360's QConf, Spring's Cloud Config, and Ali's Diamond.

Netflix's Microservices Framework

Netflix is ​​an Internet company that has successfully practiced microservice architecture. A few years ago, Netflix contributed almost its entire microservice framework stack to the community as open source. These frameworks and components include:

  1. Eureka: Service Registration Discovery Framework
  2. Zuul: Service Gateway
  3. Karyon: server-side framework
  4. Ribbon: Client Framework
  5. Hystrix: Service Fault Tolerance Components
  6. Archaius: Service Configuration Component
  7. Servo: Metrics component
  8. Blitz4j: logging component

Figure 11 below shows a microservice framework based on these components, from recipes-rss.

Fig 11, Microservice framework based on Netflix open source components

Netflix's open source framework components have been proven in production for many years in Netflix's large-scale distributed microservice environment, and are gradually being accepted by the community as standard components for constructing microservice frameworks. The Spring Cloud open source product launched by Pivotal last year is mainly based on the further encapsulation of Netflix open source components, which is convenient for Spring developers to build the basic framework of microservices. For some companies that intend to build a microservice framework system, making full use of or referring to Netflix's open source microservice components (or Spring Cloud) and making necessary enterprise customizations on this basis is undoubtedly a shortcut to a microservice architecture.

 

Gateway can be directly aggregated. In some scenarios, Ali uses gateway for business aggregation and tailoring.
The advantage of gateway not for aggregation is to focus on separation and single responsibility. As a service infrastructure, gateway can only focus on public logic across cross-sections. Such as routing, security, monitoring, logging, fault tolerance, etc., specific business aggregation actions are still independently responsible by teams with business backgrounds, usually hiding behind gateways.

 

In Netflix's microservice system,
1. Zuul only performs cross-sectional functions, such as routing, security authentication, fault tolerance, current limiting, logging, etc.
2. Zuul is just an HTTP gateway, aggregation or protocol conversion (such as rpc into http ), do
3 at the Edge Service layer, Netflix has a centralized configuration service, the client uses the Archaius component, and regularly pulls the configuration on the configuration server

Reprinted from: http://www.infoq.com/cn/articles/basis-frameworkto-implement-micro-service

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327074437&siteId=291194637