Nacos architecture and principles - service grid ecology


insert image description here


background

  1. Kubernetes realizes automatic infrastructure deployment and resource management, reduces operation and maintenance costs, and enhances business elasticity.
  2. Kubernetes has great advantages in application deployment and elasticity, but lacks support for service governance, gateways, authentication and authentication, and observability.
  3. Many products and traditional middleware are reformed to make up for the shortcomings of Kubernetes, and migrate to cloud native and Kubernetes.
  4. Service grid is the next generation of microservice governance, which sinks to the infrastructure layer for general governance and supports heterogeneous systems.
  5. Service mesh relies on Kubernetes container orchestration from theory to practice, and has received widespread attention and production use.
  6. Istio is the most popular service mesh, providing a standard declarative API that abstracts the infrastructure like Kubernetes.
  7. Nacos deeply integrates Spring ecology and Dubbo, and now integrates Istio, allowing users to use it in various scenarios.
  8. Nacos keeps up with the technology trend and does not miss the dividends of Kubernetes and service mesh.
  9. Kubernetes has changed the way of resource management and deployment operation and maintenance, and Nacos is also evolving to adapt to new technologies.
  10. Kubernetes reduces operation and maintenance costs and improves business flexibility. Service mesh and Nacos provide users with a better technical experience.
  11. The cloud-native technology Kubernetes and service grid have brought about industrial changes, and Nacos is connected to its ecosystem to bring value to users.

What is a service mesh

To deeply understand the concept of service grid, clarify the problems to be solved by service grid, and realize the business value brought by service grid, it is necessary to start from the beginning with the evolution and development of application architecture.

The Evolution of Monolithic Architecture to Microservice Architecture

In recent years, with the continuous development and expansion of the business system, the monolithic application has completed the transformation to the microservice architecture. The application is divided into services according to the functional dimension and business domain. Different business teams focus on the services they are responsible for. Each microservice iterates independently and does not affect each other. This idea of ​​splitting business domains not only speeds up business development, but also brings a more agile development experience.

Everything has two sides. While microservices improve the iterative speed and agility of business applications, they also bring more challenges to service governance . It was originally a monolithic application, and all services were in one process. The calls between services were method calls, and the entire request processing flow was in the current thread, which is very convenient for debugging and troubleshooting. After transforming into a microservice architecture, the services in the original monomer become independently deployed and running services, and method calls become remote calls.

service discovery

The first problem to be solved is the service discovery problem , how does the consumer service discover the provider service at runtime, and the ip address of the independently deployed service node is not fixed, which means that a dynamic discovery capability is required. The emergence of the registration center is to solve the problem of service discovery in the microservice architecture. Each microservice will register its node network ip with the registration center when it is deployed and released , and will also log out of the registration center in time when it goes offline . At the same time, each microservice will also subscribe to the registration center for the node information of other microservices it depends on . When the node information of the subscribed microservice changes, it can receive and update the local connection pool in real time.


load balancing

After solving the problem of how to discover between services, the Consumer service can select an IP address from the node information list obtained from the registration center to initiate a network call to the Provider service when processing the request. In order to maximize resource utilization and minimize request RT , it is necessary to select an optimal node from the node pool , which is load balancing . If the hardware resources occupied by the copies of microservices are different, more traffic needs to be given to nodes with sufficient hardware resources. If the copies of the microservice are located in different regions, it is necessary to preferentially access the node in the same region as the caller. If the business requires session stickiness, requests from the same user must always access the same node. If the microservice needs to be warmed up after startup, traffic needs to be gradually diverted to this node


Fuse current limit

In the current process of the entire call chain in a single application , in the face of a sudden traffic peak , we only need to fuse and limit the flow at the application entrance . In the microservice architecture, each microservice is deployed independently, and the number of copies varies according to the importance of its functions. In the face of high concurrent traffic requests, the thresholds for circuit breaking and throttling of each service should be different. In addition, the microservice architecture increases the number of network hops in the entire request processing chain , any one of the upstream services can drag down the downstream services, and even cause the system as a whole to be unavailable .


Observable (monitoring and alerting)

In terms of observability , the microservice architecture mainly relies on three means:

  1. Tracing : Distributed link tracing, used to locate the specific node where the fault occurred. Because the request invocation link of the microservice is complex, it is necessary to know the access link of the entire request through some means.
  2. Logging : log records, used to locate the specific cause of the failure. Used in conjunction with Tracing to record request metadata information in each service to help troubleshoot.
  3. Metrics : Set the alarm rules for the number of connections, number of requests, success rate and other indicators of each service to find problems and improve system stability.

Authentication

In the microservice architecture, in order to protect the security of sensitive services , authentication and authentication mechanisms are usually used to restrict only specific clients from accessing sensitive services. The specific method is:

  • A centralized authorization system is introduced to authorize which clients can access each sensitive service.
  • When the client initiates a request call, it first obtains credential information from the authorization system.
  • Carry credential information when accessing sensitive services.
  • Sensitive services verify the credential information and permissions of all requests received, and only process the request if the verification passes, otherwise reject it.
    This can avoid the problem of random access to sensitive services by any client that can obtain service node information under the microservice architecture

other…

Service publishing, communication security, dynamic routing...

summary

The challenges and solutions brought by the microservice architecture are discussed in detail above through some real business scenarios, which basically cover several major areas of service governance, service discovery, load balancing, circuit breaking and current limiting, monitoring and alarming, and authentication and authentication .

We can use the following figure to cover several major areas involved in the microservice architecture.

insert image description here


Traditional Solutions for Microservices Architecture

In summary, the main drivers of business evolution to the microservice architecture are:

  • The increase in the difficulty of collaboration and the need for release frequency brought about by the growth of business scale, the independent development and deployment of each service under the microservice architecture reduces the difficulty of team collaboration and meets the needs of frequent iterations.
  • With the rapid evolution of Internet business, service splitting is imminent. Major companies respond to business changes through service splitting and microservice frameworks.
  • The open source microservice framework (Spring Cloud, Dubbo, Gin, etc.) implements various functions of distributed system communication and service governance (load balancing, service discovery, fuse, etc.), simplifies microservice development, and allows developers to focus on business.
  • The new business chooses the microservice architecture, and realizes agile development with the help of the framework.
  • These frameworks are friendly to business developers, shield the underlying details, enable developers to focus on business, and develop robust distributed systems with a small amount of framework code.

In short, business development and technological progress have jointly promoted the popularity of microservice architecture. Open source frameworks simplify microservices development and make it more practical, which also accelerates the process. The microservice architecture effectively solves the collaboration and delivery challenges brought about by large-scale distributed development, and better meets the rapidly evolving business needs.


The following figure shows the call between services in the microservice scenario. The business developer only needs to initiate a call to service B in the business code of service A, and does not need to care about how the bottom layer of the request call is implemented, such as how to find the address of service B , how to balance the load of service B, how to perform flow control, encoding and decoding at the protocol layer, and reliable communication of the underlying network.

insert image description here

This microservice model architecture seems perfect, but there are some essential problems:

  1. The high learning cost of the framework itself . Developers need to spend more energy to master and manage complex frameworks, and it is also difficult to solve framework problems.
  2. The business is tightly coupled with the framework . The framework is bound with the service in the form of a library, the issues of version compatibility and upgrade are complicated, and the upgrade of the framework will affect the stability of the service.
  3. Limit the business technology stack . Frameworks usually only support certain languages, so businesses can only choose languages ​​and middleware that are compatible with the framework. A language without framework support is difficult to integrate into a microservices architecture. It is difficult to implement different modules of the microservice system in multiple languages.

Therefore, although the microservice framework simplifies development, it also brings the above problems. When the business selects and uses the microservice framework, it needs to weigh these issues and do a good job in management and risk control.

In practical applications, you can choose a simple and stable framework, or customize the development framework, or not use any framework, and directly face various problems caused by distribution. To choose the best solution according to the actual business situation


Next Generation Microservice Architecture - Service Mesh

In order to solve the above three limitations of the traditional microservice architecture, the proxy mode (sidecar mode) represented by Istio and NginxMesh came into being. This is the hot service grid technology in the field of microservice architecture - Service Mesh , which abstracts the communication layer of distributed services into a separate layer, in which functions required by distributed systems such as load balancing, service discovery, authentication and authorization, monitoring and tracking, and flow control are realized .

From a macro point of view, the implementation method is to introduce a proxy service, which is deployed together with each business service in the way of Sidecar (sidecar mode), and the proxy service takes over all the incoming and outgoing traffic of the service. As the core control brain, the control plane performs unified traffic control and management on all business proxy services (Sidecar) .
insert image description here

From a microscopic point of view, this proxy service indirectly completes the communication requests between services through the traffic communication between proxy business services, and all service governance involved in the distributed system is completed in the proxy service. Through such a service governance layer decoupled from the business, the three problems mentioned above can be easily solved.

Service Governance Layerinsert image description here


Service mesh star product Istio

Istio is Greek, meaning "set sail", and echoes "kubernetes (helmsman)". It is an open source microservice management, protection, and monitoring infrastructure. Istio is pronounced "Istio".


What is Istio

Istio is an open source project jointly developed by Google, IBM and Lyft teams to connect, secure, control and observe services deployed in clusters.

One of the more popular cloud-native technology ServiceMesh is implemented using Istio + Envoy .

As a proxy, Envoy is deployed together with application services in the form of SideCar, transparently intercepts all ingress traffic and inbound and outbound traffic of application services, and executes some additional governance policies before forwarding traffic. These operations are transparent to business services without perception of . In this way, if we sink the functions of the service governance-related SDK coupled with business applications to SideCar, then the business code will be decoupled from the service governance code and can develop in parallel and iteratively.

From this perspective, ServiceMesh provides the infrastructure layer of network communication at the application level, and executes user-configured governance policies on the traffic on it. The role of defining and issuing these governance policies is Istio.

We all know that K8s has changed the traditional way of application deployment and release, providing containerized application services with flexible and convenient container arrangement, container scheduling and simple service discovery mechanism, but lacks richer and finer-grained service governance capabilities . The emergence of Istio is precisely to make up for the lack of K8s in service governance. It defines a set of standard APIs to define common governance strategies.

K8s and Istio are complementary, and together determine the deployment, release, and runtime behavior of business applications.

insert image description here

The gateway is at the edge of the cluster and controls the inbound and outbound traffic of the cluster. It can be regarded as an independent deployment of the Envoy proxy, which proxies the entire cluster.

Envoy

Envoy is an open source cloud-native edge proxy (edge ​​proxy) and communication bus developed and maintained by Lyft. The main role of Envoy is as a proxy in the microservice architecture, providing the following functions:

  1. Service discovery: Envoy integrates a variety of service discovery mechanisms, which can dynamically discover backend servers.
  2. Load balancing: Envoy supports a variety of load balancing algorithms, and can perform load balancing based on the list of backends discovered by the service.
  3. Fuse and fault injection: Envoy supports the fuse mode, which can directly return an error when the backend service fails to avoid fault propagation. It also supports fault injection, which is used to test the disaster recovery capability of the application.
  4. Monitoring and telemetry: Envoy exposes metrics in Prometheus format, making it easy to use Prometheus and other systems for service monitoring and telemetry.
  5. Grayscale publishing: Envoy supports traffic separation based on HTTP headers, cookies, etc., and can realize grayscale publishing of microservices.
  6. Security: Envoy supports SSL/TLS-based mutual identity authentication mTLS, which can encrypt and authenticate communication between microservices.
  7. Observability: Envoy generates various statistics, logs and traces for monitoring traffic and debugging applications.

Therefore, the main role of Envoy is to serve as an agent in the microservice architecture, provide rich service governance and observability functions, and help build a stable and reliable microservice system. It shields a large number of complex details in distributed system communication, so that service developers can focus on domain business.
Envoy has been used by many well-known companies, such as Lyft, Airbnb, Pinterest, Expedia, Aral, etc. In China, Ali, Didi and other companies have also used Envoy

Envoy and Istio

Both Envoy and Istio are open source projects for microservice architectures, but with a slightly different focus:
Envoy:

  1. Envoy is an open source edge proxy (edge ​​proxy), which mainly implements functions such as service discovery, load balancing, circuit breaking, monitoring, and gray scale publishing.
  2. Envoy builds a distributed proxy layer by deploying multiple proxy instances to achieve the above functions. Client requests for each service will first pass through the Envoy edge proxy and then be forwarded to the backend service.
  3. Envoy is just a proxy, relatively lightweight in nature, and decoupled from the specific service framework. A service that can be used for any language/framework development.
  4. Envoy cannot implement more advanced service governance functions, such as security (inter-service mTLS), traffic management, etc.
    Istio:
  5. Istio is a service mesh framework. In addition to proxy functions, it also provides richer service governance functions, such as security, traffic management, and observability.
  6. Istio is based on the Envoy proxy, but built on top of Envoy, providing service discovery, security, traffic management and other control functions to control the entire service grid.
  7. Istio needs to be used in combination with a specific service framework (such as Kubernetes), and is highly coupled with the service framework.
  8. Istio has more advanced service governance capabilities, but it is also more heavyweight and difficult to deploy and maintain.

Therefore, Envoy is a lightweight open source service proxy that mainly provides basic service governance functions. Istio is a complete set of service grid solutions, based on Envoy but providing richer functions, which is a more heavyweight solution.
Depending on your needs, you can choose to use Envoy or Istio, or a combination of both. Envoy can also be used independently of Istio.


The basic architecture of Istio

The Istio project is a new generation of cloud-native service governance framework built on the Kubernetes operation and maintenance platform

Its architecture diagram is as follows, taken from Istio official website ( https://istio.io )

insert image description here

insert image description here
It mainly involves the proxy service Proxy on the data plane, the cluster ingress gateway Ingress, the cluster egress gateway Egress, and the core control plane Istiod .

The functions of each component are as follows:

  • Proxy service Proxy uses Lyft's open source high-performance C++ network proxy Envoy to intercept all incoming and outgoing traffic of the business.
  • The ingress gateway, as the access entrance of the cluster, controls how the internal services of the cluster are safely exposed, and performs unified control and observation of all ingress traffic.
  • The egress gateway, as the access egress of the cluster, controls how the internal services of the cluster can securely access external services.
  • The core control plane Istio is responsible for delivering service discovery information, traffic management configuration, and TLS certificates for mutual authentication between services to all proxy services on the data plane (including Ingress and Egress gateways).

It can be seen that Istiod is a master in the field of microservices, covering service discovery, service governance, authentication and authentication, and observability, and proposes new solutions for the business of microservice architecture in the cloud-native era in a non-invasive way.


Ecological Evolution of Nacos Service Mesh

In order to integrate into the service grid ecology, Nacos completed an evolution from the microservice 1.0 architecture to the service grid architecture.


Nacos under the traditional microservice architecture

For Nacos under the traditional microservice architecture, its traffic enters from Tengine (Nginx), passes through the microservice gateway, and then enters the microservice system.

The reason why it is divided into two layers of gateways is because

  • The first layer of Tegine (Nginx) is responsible for traffic access. The core capabilities are anti-large traffic, security protection and support for https certificates, and the pursuit of versatility, stability and high performance.
  • The second layer is the microservice gateway. This layer of gateway focuses on microservice-related capabilities such as authentication and authentication, service governance, protocol conversion, and dynamic routing. For example, the open source spring cloudgateway and zuul are all microservice gateways.

After the traffic enters the microservice system, it will realize inter-service calls through the microservice framework, such as hsf/dubbo, spring cloud, etc., then the core role played by Nacos here is the service discovery capability, for example, Cousumer will first obtain the provider from Nacos The address of the service list, and then initiate a call, and the microservice gateway will also obtain the upstream service list through Nacos. These capabilities are mainly provided through the SDK , and some load balancing and disaster recovery protection strategies will also be added to the SDK .

insert image description here

Nacos under the traditional microservice architecture has the following problems:

  1. Tengine does not support dynamic configuration, including the open source Nginx natively. Ali internally implements configuration changes by reloading the configuration regularly, which leads to the inability to change the configuration in time and affects the efficiency of research and development;
  2. In the Fat SDK mode, logics such as service governance and service discovery are strongly coupled with the SDK. If the logic needs to be changed, the SDK must be modified to promote the upgrade of the business side;
  3. SDKs in different languages ​​need to be maintained in multiple languages, which is costly and difficult to unify service governance strategies;

Nacos in the service grid era

With the development of cloud native technology and the introduction of microservice 2.0 architecture, many companies are trying to solve the problems in microservice 1.0 architecture through service grid technology. In the microservice architecture 2.0 architecture, traffic is accessed through the ingress gateway and enters the microservice system. The difference from the 1.0 architecture is that Envoy on the data plane and Istio on the control plane are introduced . Envoy is deployed in the same Pod as the application in Sidecar mode. In this architecture, the inbound and outbound traffic of the application will be hijacked, and then the XDS configuration delivered by Istio on the control plane can be used to realize traffic control, security, and observability. Most of the capabilities of the SDK are stripped out and submerged in Sidecar, which also realizes the unified management of different languages.

insert image description here

Service mesh technology has many advantages, but the introduction of new architecture will also bring new problems, especially for companies with heavy technical burdens, such as: sidecar performance problems, private protocol support problems, new and old architecture systems How to migrate smoothly and so on.

Mainly focus on the issue of smooth migration of old and new architecture systems. Smooth migration will inevitably face two problems about service discovery:

How the old and new architecture systems can discover each other , because the migration process will inevitably have the coexistence of the two systems, and the applications need to call each other; How the
registration center supports the microservice grid ecology , because istio currently supports the K8s service service discovery mechanism by default;

So, how to solve these problems in the Nacos service grid ecology? Observe the following architecture diagram, the traffic is from the cloud-native gateway (cloud-native gateway, which has the characteristics of being compatible with the micro-service architecture, not only supports the micro-service gateway, but also conforms to the cloud-native architecture, and supports the K8s standard Ingress gateway ) come in, and then enter the microservice system, where 1.0 applications (non-mesh applications) and meshed applications coexist.

insert image description here
The figure above explains how a non-mesh application accesses a mesh application. From this architecture diagram, we can see that non-mesh applications still register or subscribe to services from Nacos through the SDK, and providers that have been meshed will also be registered on Nacos , so that non-mesh applications can also obtain meshed applications. For the application service information, the provider registration service is generally through the SDK method, because the open source Envoy does not support the proxy registration function. Of course, when Ali implemented it internally, the service registration capability has actually been subsided to the sidecar.

Another question is how to do service discovery for meshed applications. We can look at the lower part of the architecture diagram. Nacos has already supported the capability of the MCP server. Istio obtains a full list of service information from Nacos through the MCP protocol, and then converts it into an XDS configuration and sends it to Envoy , which supports meshing The service discovery within the application can also access non-mesh services. During the meshing process, the service discovery can be seamlessly migrated without any modification.

MCP protocol

The MCP protocol is a configuration synchronization protocol between components proposed by the Istio community. This protocol has been abandoned after 1.8. The alternative is the MCP over XDS protocol. Nacos is compatible with both protocols.

In addition to the MCP protocol synchronization scheme, there are also other schemes to synchronize the service data of the registration center to the ServiceMesh system. These schemes are compared, as shown in the following figure:

insert image description here


The Nacos service grid ecology has been implemented on a large scale in Ali

This picture generally summarizes the two scenarios where Ali landed.

insert image description here

Scenario 1:
The scenario where Dingding Cloud communicates with the group is essentially the application interoperability in the hybrid cloud scenario. We use a gateway to connect the two environments. Dingding VPC (Alibaba Cloud Deployment) uses MSE For the cloud native gateway, the group uses the Envoy gateway. They use the Dubbo3.0 Triple protocol to realize network communication. The control plane of the gateway uses Istio, and Istio will synchronize the service list data from Nacos through the MCP protocol.
Using this architecture solves two problems:

  1. Private cloud and public cloud network communication security issues, because mtls encrypted communication is used between gateways;
  2. Smoothly supports the microservice architecture, because the application calls the gateway through the Triple protocol, no code changes are required for the business, and the service discovery is to synchronize data through Nacos mcp;

This architecture is also used in the intercommunication scenario of Ant Group, which is on the left side of this picture. Ant’s gateway uses the architecture of Mosn onEnvoy


Scene two:

The microservice mesh scenario of the group corresponds to the middle and lower part of this picture. The difference between the internal implementation and the community is that Envoy is directly connected to the Nacos registration center. The main reason for using this solution is to consider performance issues. Some of our applications will have several If the instance ip of tens of thousands is pushed through EDS, because the data volume is too large, it will cause problems such as Istio OOM or high CPU on the Envoy data plane.

insert image description here

Guess you like

Origin blog.csdn.net/yangshangwei/article/details/131364310