Talking about Infrastructure in Microservice Architecture: Service Mesh and Istio

Evolution of Microservice Architecture

As an architectural pattern, microservices divide a complex system into dozens or even hundreds of small services, each of which is responsible for implementing an independent business logic. These small services are easy to understand and modify by a small team of software engineers, and bring flexibility in language and framework selection, shorten application development and launch time, and independently scale and expand services according to different workloads and resource requirements.

On the other hand, when an application is split into multiple microservice processes, method calls within a process become remote calls between processes. Introduces the complexity of connecting, managing, and monitoring a large number of services. 

microservice01.png


The change brings a series of problems with distributed systems, such as:

  • How can I find the provider of the service?
  • How to ensure the reliability of remote method calls?
  • How to ensure the security of service calls?
  • How to reduce the latency of service calls?
  • How to perform end-to-end debugging?


In addition, microservice instances in production deployment also increase the difficulty of operation and maintenance, such as:

  • How to collect performance metrics of a large number of microservices for analysis?
  • How to upgrade microservices without affecting the online business?
  • How to test the fault tolerance and stability of a microservice cluster deployment?


These problems involve communication, management, deployment, versioning, security, failover, policy enforcement, telemetry, and monitoring of hundreds of services, and solving the problems introduced by these microservices architectures is not an easy task.

Let’s review the evolution of the microservices architecture. Before the emergence of service meshes, we first managed the communication logic between services in microservice applications, including service discovery, circuit breaker, retry, timeout, encryption, current limiting and other logic.

02.png


In a distributed system, this part of the logic is relatively complex. In order to provide a stable and reliable infrastructure layer for microservice applications, to avoid repeating the creation of wheels, and to reduce the possibility of making mistakes, this part is generally responsible for service communication. The logic is abstracted and summarized to form a code base for each microservice application to use, as shown in the following figure:

03.png


The public code base reduces the workload of application development and maintenance, and reduces the probability of errors in the communication logic of microservices implemented by application developers alone, but there are still the following problems:

  • The communication logic of microservices is not transparent to application developers, who need to understand and use the code base correctly, and cannot focus all their energy on business logic.
  • Different code bases need to be developed for different languages/frameworks, which in turn will affect the choice of microservice application development languages ​​and frameworks, and affect the flexibility of technology choices.
  • As time changes, the code base will exist in different versions, and the compatibility of different versions of the code base and the upgrade of microservices in a large number of operating environments will become a difficult problem.


An analogy can be made between the communication infrastructure layer between microservices and the TCP/IP protocol stack. The TCP/IP protocol stack provides basic communication services for all applications in the operating system, but there is no tight coupling between the TCP/IP protocol stack and the application program. The application only needs to use the underlying communication functions provided by the TCP/IP protocol, and Don't care about the implementation of the TCP/IP protocol, such as how IP performs routing, how TCP creates links, etc.

Similarly, microservice applications should not need to pay attention to the low-level details of communication between microservices such as service discovery, Load balancing, Retries, and Circuit Breaker. If this part of the logic that will provide communication services for microservices is extracted from the application process, deployed as a separate process, and used as a communication agent between services, the architecture shown in the following figure can be obtained:

04.png


Because the communication agent process is deployed along with the application process, this deployment method is vividly called "Sidecar"/sidecar (that is, the sidecar of a three-wheeled motorcycle).

All traffic between applications needs to go through the proxy. Since the proxy is deployed on the same host as the application in a sidecar manner, the communication between the application and the proxy can be considered reliable. The agent is responsible for finding the intended service and for the reliability and security of communications.

When a large number of services are deployed, a grid as shown in the following figure is formed with the connection between the sidecar agents deployed by the service, which becomes the communication infrastructure layer of the microservices and carries all the traffic between the microservices , known as Service Mesh. 

mesh05.png


 

A service mesh is an infrastructure layer that handles inter-service communication. Cloud-native applications have complex service topologies, and service meshes ensure that requests can travel reliably through these topologies. In practice, a service mesh is usually composed of a series of lightweight network proxies that are deployed with the application, but the application does not need to be aware of their existence.


William Morgan , WHAT'S A SERVICE MESH? AND WHY DO I NEED ONE?

There are a large number of sidecar proxies in the service mesh, and if each proxy is set up separately, the workload will be huge. In order to more conveniently and centrally control the agents in the service mesh, a control plane component is added to the service mesh.

controlplane06.png


Here we can compare the concept of SDN. The control plane is similar to the controller in the SDN network management, which is responsible for specifying routing policies and issuing routing rules; the data plane is similar to the switch in the SDN network, responsible for forwarding data packets.

Since all communications of microservices are provided by the service mesh infrastructure layer, these communications can be monitored, managed and controlled through the cooperation of the control panel and data panel, so as to achieve grayscale release of microservices, call distributed tracing, and faults Inject simulation tests, dynamic routing rules, microservice closed-loop control and other management and control functions.

If you want to improve your technology and learn the above technical points, you can join the group to get it. Here I recommend an exchange and learning group: 575745314, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, principles of high concurrency, high performance, distributed, and microservice architecture, and JVM performance optimization have become essential knowledge systems for architects. You can also receive free learning resources, which are currently benefiting a lot.

Istio service mesh

Istio is a Service Mesh open source project, which is another masterpiece of Google after Kubernetes. The main participating companies include Google, IBM and Lyft.

With the good architecture design of Kubernetes and its strong scalability, Google has built an ecosystem around Kubernetes. Kubernetes is used for the orchestration of microservices (orchestration is the literal translation of English Orchestration, which in vernacular describes the relationship between a set of microservices, and is responsible for the deployment, termination, upgrade, scaling, etc. of microservices). It uses CNI (Container Network Interface) downward, and the CRI (Container Runtime Interface) standard interface can connect to different networks and container runtime implementations to provide the infrastructure for microservices to run. Upwards, Istio provides microservice governance capabilities.

As can be seen from the figure below, Istio complements an important part of the Kubernetes ecosystem and is a milestone expansion in Google's microservices map. 

k8s-ecosystem07.png


Google's use of Istio to promote the de facto standard of microservice governance is of great significance to Google's own product, Google Cloud. Other cloud service vendors, such as Redhat, Pivotal, Nginx, Buoyant, etc., saw the general trend and followed suit, announcing the integration of their own products with Istio to avoid being left behind and losing market opportunities.

It is foreseeable that in the near future, for cloud-native applications, using Kubernetes for service deployment and cluster management, and using Istio for service communication and governance will become the standard configuration of microservice applications.

The Istio service consists of two parts: the data plane and the control plane.

  • The data plane consists of a set of intelligent proxies (Envoy) deployed as sidecars that mediate and control all network communication between microservices.
  • The control plane is responsible for managing and configuring proxies to route traffic and enforce policies at runtime.

 

istio-architecture08.png

 

Istio control plane

The Istio control panel consists of 3 components: Pilot, Mixer and Istio-Auth.

Pilot

Pilot maintains a standard model of services in the grid that is independent of the various underlying platforms. Pilot interfaces with various underlying platforms through adapters to populate this standard model.

For example, the Kubernetes adapter in Pilot obtains information such as changes in Pod registration information in Kubernetes, entry resources, and storage traffic management rules through the Kubernetes API server, and then translates the data into a standard model for Pilot to use. Through the adapter mode, Pilot can also obtain service information from Mesos, Cloud Foundry, and Consul, and can also develop adapters to integrate other components that provide service discovery into Pilot.

In addition, Pilo also defines a set of standard APIs for communicating with the data plane. The APIs provide interfaces including service discovery, load balancing pools, and dynamic updates of routing tables. Decoupling the control plane and data plane through this standard API simplifies design and improves cross-platform portability. Based on this standard API, a variety of sidecar proxies have been integrated with Istio. In addition to Envoy currently integrated by Istio, it can also be integrated with third-party communication proxies such as Linkerd and Nginmesh, or you can write your own sidecar implementation based on this API.

Pilot also defines a set of DSL (Domain Specific Language) language. The DSL language provides a high-level business-oriented abstraction that can be understood and used by operation and maintenance personnel. Operation and maintenance personnel use this DSL to define traffic rules and send them to Pilot. These rules are translated into data plane configurations by Pilot, and then distributed to Envoy instances through standard APIs, which can control and adjust the traffic of microservices during runtime. 

pilot09.png



In a microservice application, Mixer

usually needs to deploy some basic back-end public services to support business functions. These infrastructures include policy classes such as access control, quota management; and telemetry reports such as APM, logs, etc. Microservice applications and these back-end support systems are generally directly integrated, which leads to tight coupling between applications and basic settings. If the basic settings need to be upgraded or changed for operation and maintenance reasons, each microservice needs to be modified. application code, and vice versa.

To solve this problem, Mixer introduces a common middle layer between the application code and the infrastructure backend. This middle layer decouples the application and the backend infrastructure, the application code no longer integrates the application code with a specific backend, but rather a fairly simple integration with Mixer, which is then responsible for connecting with the backend system.

Mixer mainly provides three core functions:

  • Precondition check. Allows the service to validate some preconditions before responding to incoming requests from service consumers. Preconditions can include whether the service user is properly authenticated, whether it is on the whitelist of the service, whether it passes the ACL check, and so on.
  • Quota management. Enabling services to allocate and release quotas in multiple dimensions, quotas, a simple resource management tool, can provide relatively fair (competitive means) when service consumers compete for limited resources. Rate Limiting is an example of a quota.
  • Telemetry report. Enables services to report logging and monitoring. In the future, it will also enable tracking and billing flows for service operators as well as service consumers.


The architecture of Mixer is shown in the figure:

mixer210.png


First, Sidecar will collect relevant information from each request, such as the request path, time, source IP, destination service, tracing headers, logs, etc., and report these attributes to Mixer. The connection between the Mixer and the backend service is through an adapter, and the Mixer sends the content reported by the Sidecar to the backend service through the adapter.

Since Sidecar only connects with Mixer and is not coupled with back-end services, the Mixer adapter mechanism can be used to access different back-end services without modifying the application code. For example, through different Mixer adapters, Metrics Collect Prometheus or InfluxDB, and even dynamically switch background services without stopping application services.

Secondly, Sidecar will make policy judgment through Mixer when processing each request, and decide whether to continue processing the call according to the result returned by Mixer. In this way, Mixer moves policy decisions out of the application layer, enabling operation and maintenance personnel to configure policies during runtime, dynamically control application behavior, and improve the flexibility of policy control. For example, you can configure the access whitelist for each microservice application, rate limiting for different clients, and so on.

Logically, each request call between microservices will be processed by Mixer twice: policy judgment before the call, and telemetry data collection after the call. Istio employs some mechanisms to avoid Mixer's processing affecting Envoy's forwarding efficiency.

As you can see from the above figure, Istio adds a Mixer Filter to Envoy, which communicates with the Mixer component of the control plane to complete policy control and telemetry data collection. The Mixer Filter stores the data cache required for policy judgment, so most policy judgments are processed in Envoy, and there is no need to send requests to Mixer. In addition, the telemetry data collected by Envoy will be stored in Envoy's cache first, and then reported to Mixer in batches at regular intervals.

Auth

Istio supports Mutual SSL Authentication and Role-Based Access Control (RBAC) to provide an end-to-end security solution.

Certification

Istio provides an internal CA (Certificate Authority) that issues certificates for each service, provides two-way SSL authentication for access between services, and encrypts communications. Its architecture is shown in the following figure: 

auth11.png


Its working mechanism is as follows:

When deploying:

  • The CA listens to the Kubernetes API Server, creates a pair of keys and certificates for each Service Account in the cluster, and sends them to the Kubernetes API Server. Note that instead of generating a certificate for each service, a certificate is generated for each Service Account. Service Accounts and services deployed in Kubernetes can have a one-to-many relationship. The Service Account is stored in the SAN (Subject Alternative Name) field of the certificate.
  • When a Pod is created, Kubernetes loads the key and certificate as a Pod's Volume in the form of Kubernetes Secrets resources according to the Service Account associated with the Pod for use by Envoy.
  • Pilot generates the configuration of the data plane, including the key and certificate information that Envoy needs to use, and which Service Account can allow which services to run, and sends it to Envoy.

 

Note: If it is a virtual machine environment, a Node Agent is used to generate a key, apply for a certificate from Istio CA, and then pass the certificate to Envoy.

Runtime:

  • Outbound requests from service clients are taken over by Envoy.
  • The client's Envoy and the server's Envoy start a two-way SSL handshake. During the handshake phase, the client Envoy will verify whether the Service Account in the server's Envoy certificate has permission to run the requested service. If not, the server is considered untrustworthy and the link cannot be created.
  • When the encrypted TSL link is created, the request data is sent to Envoy on the server side, and then sent to the service by Envoy through a local TCP link.


Authentication

Istio's "Role-Based Access Control" (RBAC) provides three different granularity of service access control: namespace, service, and method. Its architecture is shown in the following figure: 

authorization12.png


Administrators can customize access control security policies, which are stored in the Istio Config Store. The Istio RBAC Engine obtains the security policy from the Config Store, judges the request initiated by the client according to the security policy, and returns the authentication result (allowed or prohibited).

Istio RBAC Engine is currently implemented as a Mixer Adapter, so it can obtain the identity (Subject) and operation request (Action) of the access requester from the context passed by the Mixer, and implement policy control on the access request through the Mixer, allowing or A request is prohibited.

There are two basic concepts in Istio Policy:

  • ServiceRole, which defines a role and assigns the role access to services in the grid. When specifying role access permissions, it can be set at different granularities of namespaces, services, and methods.
  • ServiceRoleBinding, binds the role to a Subject, which can be a user, a group of users, or a service.

 

Istio data plane

The Istio data plane is deployed with microservices in a "sidecar" manner, providing secure, fast, and reliable inter-service communication for microservices. Since Istio's control plane and data plane interact with standard interfaces, data can have multiple implementations. Istio uses an extended version of the Envoy proxy by default.

Envoy is a high-performance proxy developed in C++ that mediates all inbound and outbound traffic for all services in a service mesh. Many of Envoy's built-in features are carried forward by Istio, such as dynamic service discovery, load balancing, TLS encryption, HTTP/2 & gRPC proxies, circuit breakers, routing rules, fault injection, and telemetry.

The features supported by the Istio data plane are as follows:

b1.png


 

Note: Outbound features refer to the functional features provided by the sidecar on the service request side, and inbound features refer to the functional features provided by the sidecar on the service provider side. Some features such as telemetry and distributed tracing need to be supported by both sidecars; while others only need to be provided on one side. For example, authentication only needs to be provided on the service provider side, and retry only needs to be provided on the request side.

Typical application scenarios

Istio service management includes the following typical application scenarios:

Distributed call tracking

In the microservice architecture, the business call chain is very complex, and a request from a user may involve the collaborative processing of dozens of services. Therefore, a tracking system is needed to record and analyze the related events of the same request in the entire call chain, so as to help R&D and operation and maintenance personnel analyze system bottlenecks, quickly locate exceptions and optimize the call chain.

By collecting call-related data on the Envoy proxy, Istio implements non-invasive distributed call tracking analysis for applications. The principle of Istio's implementation of distributed call tracing is shown in the following figure:

distributed-tracing13.png


Envoy collects the data of each segment in an end-to-end call, and sends the call tracking information to Mixer, and the Mixer Adapter sends the tracking information to the corresponding service backend for processing. The entire call tracing information generation process does not require application intervention, so there is no need to inject distributed tracing related code into the application.
 

Note: The application still needs to forward the tracing-related headers in the received ingress request when making the egress call, and pass it to the next sidecar in the call chain for processing.

Metric

Collection The principle of Istio's implementation of metric collection is shown in the following figure:

metrics-collecting14.png


Envoy collects raw data related to indicators, such as requested services, HTTP status codes, and call delays. The collected indicator data is sent to Mixer, and the indicator information is converted and sent to the back-end monitoring system through Mixer Adapters. Since Mixer uses a plug-in mechanism, the back-end monitoring system can be dynamically switched during runtime as needed.

If you want to improve your technology and learn the above technical points, you can join the group to get it. Here I recommend an exchange and learning group: 575745314, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, principles of high concurrency, high performance, distributed, and microservice architecture, and JVM performance optimization have become essential knowledge systems for architects. You can also receive free learning resources, which are currently benefiting a lot.

Grayscale release

When an application goes online, a major challenge for operation and maintenance is how to upgrade without affecting the online business. No matter how well the testing is done, there is no guarantee that all potential faults will be discovered during offline testing. When version upgrade failures cannot be avoided 100%, a controllable version release is required to control the impact of failures within an acceptable range and to roll back quickly.

Grayscale release (also known as canary release) can be used to achieve a smooth transition from the old version to the new version, and to avoid the impact of problems during the upgrade process on users.

Istio achieves grayscale publishing in a consistent manner through a high level of abstraction and good design. After a new version is released, operators can import specific traffic (such as test users with specified characteristics) into the new version service for testing by customizing routing rules. By introducing production traffic to the new version in a gradual and controlled manner, you can minimize the impact to users of failures in upgrades.

The process of using Istio for grayscale publishing is shown in the following figure:

First, deploy the new version of the service, and import the traffic of canary users to the new version of the service through routing rules.

canary-15.png


After the test is stable, use routing rules to gradually import production traffic into the new version system, such as 5%, 10%, 50%, and 80%. 

canary-216.png


If the new version works normally, all traffic will be imported into the new version service at the end, and the old version service will be taken offline; if there is a problem in the middle, the traffic can be redirected back to the old version, and the new version will be used after fixing the fault. Process repost. 

canary-317.png




In the microservice architecture, circuit breakers have many service units. If a service fails, the failure will spread due to dependencies, which will eventually lead to the paralysis of the entire system. Stablize. In order to solve such a problem, the circuit breaker pattern was created.

The circuit breaker mode means that when a service fails, the fault monitoring of the circuit breaker returns a timely error response to the caller instead of waiting for a long time. In this way, the calling thread will not be occupied for a long time due to the calling failure, thus avoiding the spread of the failure in the whole system.

The principle of Istio's circuit breaker is shown in the following figure:

circuitbreaker18.png


The administrator can set parameters such as circuit break trigger conditions and circuit break time through the destination policy. For example, set service B to be disconnected for 15 minutes after 10 5XX errors. Then, when an instance of service B meets the disconnection condition, it will be removed from the LB pool for 15 minutes. During this time, Envoy will no longer forward client requests to this service instance.

Istio's circuit breaker also supports configuring parameters such as the maximum number of links, the maximum number of pending requests, the maximum number of requests, the maximum number of requests per link, and the number of retries. When the set maximum number of requests is reached, newly initiated requests will be directly rejected by Envoy. 

circuitbreaker-parameters19.png



Fault injection

For a large microservice application, the robustness of the system is very important. There are a large number of service instances in the microservice system. When some service instances have problems, the microservice application needs to have high fault tolerance, and ensure that the system can continue to provide services normally externally by means of retry, circuit breaker, and self-healing. Therefore, when the application is released to the production system, it is necessary to conduct sufficient robustness testing of the system.

One of the biggest difficulties in robustness testing of microservice applications is how to simulate system failures. In a test environment where hundreds or thousands of microservices are deployed, it is very difficult to simulate communication failures between microservices by setting up applications, hosts or switches.

Istio carries the communication traffic between microservices through the service grid, so it can inject faults through rules in the grid, simulate the failure of some microservices, and test the robustness of the entire application.

The principle of fault injection is shown in the following figure:

fault-injection20.png


The tester injected a rule into Envoy via Pilot that added a specified time delay to requests to service MS-B. When a client request is sent to MSB-B, Envoy will add a delay to the request according to this rule, causing the client's request to time out. By setting rules to inject faults, testers can easily simulate various communication faults between microservices, and conduct relatively complete simulation tests on the robustness of microservice applications.

Summarize

A service mesh provides microservices with a secure, reliable communication infrastructure layer that is transparent to the application. After adopting a service mesh, microservice application developers can focus on solving business domain problems and leave some general problems to the service mesh. After adopting the service mesh, the dependency brought by the code base can be avoided, the heterogeneous advantages of microservices can be fully utilized, and the development team can freely choose the technology stack according to the business needs and developer capabilities.

Istio has a good architecture design and provides powerful secondary development scalability and user customization capabilities. Although Istio is still in the beta stage, it has received support from many well-known companies and products, and is a very promising open source service mesh open source project.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324398694&siteId=291194637