Alibaba Cloud experts explain the development trend of service grid in 2020

4.7 head picture.png

Author | Wang Xining Alibaba Senior Technical Expert

Follow the "Alibaba Cloud Native" public account and participate in the message interaction at the end of the article, that is, you will have the opportunity to receive book donation benefits!

This article is taken from the book "Analysis and Practice of Istio Service Grid Technology" written by Alibaba Cloud senior technical expert Wang Xining. The article starts with basic concepts and introduces what is the service grid and Istio. The big development trend introduced systematically and comprehensively the relevant knowledge of Istio service grid. You only need to be happy to participate in the interaction at the end of the public account, we are responsible for paying the bill! Technical man essential book "Istio service grid technology analysis and practice" free collar ~

It is pointed out in foreign languages that Service Mesh technology will have the following three major developments in 2020:

  • Fast-growing service grid demand;
  • Istio
    is hard to beat and may become the de facto standard for service mesh technology;
  • With more service mesh use cases emerging, WebAssembly will bring new possibilities.

What is a service mesh

Gartner 2018 analysis report on service grid technology trends shows a series of service grid technologies. The basis for dividing service grid technologies is based on whether the application service code must be aware of its service grid and whether it is locked, or the degree of locking .

Grid technology based on programming framework can help developers build a well-structured service, but this will lead to tight coupling of application code and framework and runtime environment. The service mesh technology based on Sidecar agents does not set these barriers for developers, and makes it easier to manage and maintain, and can provide a more flexible method to configure runtime policies.

1.png

In a microservices environment, a single application can be decomposed into multiple independent components and deployed as distributed services. These services are usually stateless, transient, dynamically scalable, and run on container orchestration systems (such as Kubernetes).

The service grid generally consists of the control plane and the data plane. Specifically, the control plane is a set of services running in a dedicated namespace. These services perform some control and management functions, including aggregating telemetry data, providing user-facing APIs, and providing control data to data plane agents. The data plane consists of a series of transparent agents running next to each service instance. These agents automatically handle all traffic to and from the service. Because they are transparent, these agents act as an out-of-process network stack, sending telemetry data to the control plane and receiving control signals from the control plane.

2.png

Service instances can be started, stopped, destroyed, rebuilt or replaced as needed. Therefore, these services require a communication middleware to support the service's dynamic discovery and self-healing connection capabilities, so that these services can communicate with each other in a safe, dynamic, and reliable manner. This is the function supported by the service grid.

The service grid is a dedicated infrastructure layer that makes service-to-service communication more secure, fast, and reliable. If you are building a cloud-native application, you need a service mesh. In the past year, the service mesh has become a key component of cloud-native programs, and it reliably delivers requests through complex service topologies that contain modern cloud-native applications. In fact, service meshes are usually implemented as a combination of lightweight network proxies that are deployed with the application code without knowing what the application is.

The concept of service mesh as a separate layer is related to the rise of cloud-native applications. In the cloud-native model, a single application may contain hundreds of services, each service may have thousands of instances, and each instance may be in a constantly changing state. This is why coordinators like Kubernetes are becoming more popular and necessary. The communication between these services not only becomes more and more complex, but also the most common part of the runtime environment, so managing the communication between these services is critical to ensure end-to-end performance and reliability.

3.png

The service mesh is a network model, located at the abstraction layer above TCP / IP. It assumes that the underlying three or four layer network exists and can transfer bytes from one point to another. It also assumes that the network is as unreliable as other aspects of the environment, so the service network must also be able to handle network failures. In some ways, the service mesh is similar to TCP / IP. Just as the TCP protocol stack abstracts the mechanism for reliably transferring bytes between network endpoints, the service grid abstracts the mechanism for reliably transferring requests between services. Like TCP, the service mesh does not care about the actual payload or its encoding, but is only responsible for completing the transmission from service A to service B, and achieving this goal while handling any failures. However, unlike TCP, the service mesh not only has the ability to "make it work", it also provides a unified application control point for introducing visibility and control into the application runtime. The clear goal of the service grid is to move service communications from the invisible infrastructure area and transform them into a part of the ecosystem where they can be monitored, managed, and controlled.

In cloud-native applications, ensuring complete reliability of requests is not easy. The service network manages this complexity through a variety of powerful technologies, and supports mechanisms such as fuse, delay-aware load balancing, eventually consistent service discovery, retry, and timeout to ensure reliability as much as possible. These functions must all work together, and the interaction with the complex environment in which they operate is also very important.

For example, when a request is sent to a service through a service grid, the interaction process can be roughly simplified to the following steps:

  • The service mesh component determines the service that the requester wants by applying dynamic routing rules. Should requests be routed to production or pre-release services? Is it routed to a local data center or a service in the cloud? Does it require grayscale to the latest version of the service being tested, or is it still routed to an older version that has been verified in production? All these routing rules are dynamically configurable and can be applied globally or to any traffic segment;
  • After finding the correct destination, the service mesh component retrieves the corresponding instance pool from the relevant service discovery endpoint, which may have multiple instances. If this information is different from the information observed by the service grid component in practice, then it will decide which information sources to trust;
  • The service mesh component selects the instance that is most likely to return a fast response based on various factors, including the observed delay data of the latest request;
  • The service mesh component attempts to send the request to the selected instance, and records the delay and type of response;
  • If the instance has been down for various reasons, or the request has not responded at all, or the request cannot be processed for any other reason, the service mesh component will retry the request on another instance as needed, provided it knows that the request is Idempotent
  • If the instance always returns an error, the service mesh component will evict it from the load balancing pool so that it can periodically try again later. This situation is very common in Internet distributed applications, and instances in public networks are very likely to cause instantaneous failure due to some reasons;
  • If the timeout of the request has passed, the service mesh component will proactively fail the request instead of adding load through further retries to prevent an avalanche. This is very important for the distributed application of the Internet, otherwise a small fault is likely to cause an avalanche disaster;
  • At the same time, the service mesh component captures all aspects of the above behavior in the form of metrics and distributed tracking, and sends these data to a centralized metrics system or link tracking system.

4.png

It is worth noting that these functions are to provide point-by-point flexibility and application-wide flexibility for distributed applications. Large-scale distributed systems (regardless of how they are built) have a clear characteristic: any small localized failure can be upgraded to a system-wide catastrophic failure. The service mesh must be designed to prevent these failures from escalating by reducing the load and failing quickly when the underlying system approaches its limits.

Why service mesh is necessary

The service mesh itself is not a new function, but more like a change in the location of the function. Web applications must always manage the complexity of service communication. In the past fifteen years, the origin of the service mesh model can be traced back to the evolution of these applications.

At the beginning of this century, the typical architecture of medium-sized Web applications is usually a three-tier application architecture, which is divided into an application logic layer, a Web service logic layer, and a storage logic layer, all of which are separate layers. Although the communication between layers is complicated, the scope is limited. At this time, the application architecture does not have a grid, but there is communication logic between the code processing logic of each layer.

When the network developed to a very high scale, this architectural approach began to become stretched. In particular, some large Internet companies are facing huge traffic demands and have realized the predecessor of an effective cloud-native method: the application layer is divided into many services, which are now commonly known as "microservices", and the topologies form way of communication. In these systems, usually in the form of a "fat client" library, which is the OSS library similar to Netflix described earlier, Hystrix's fuse capability is a good example. Although these code bases are related to a specific environment and require the use of specific languages ​​and frameworks, they are used to manage the form and capabilities of communication between services, which was a good choice in the circumstances at that time, and also in many companies used.

After entering the cloud-native era, the cloud-native model has two important factors:

  • Containers (such as Docker) provide resource isolation and dependency management;
  • The orchestration layer (such as Kubernetes) abstracts the underlying hardware as a homogeneous resource pool.

Although these code base components allow the application to have a certain load expansion capability and deal with some of the failures that always exist in the cloud environment, with the increase of hundreds of services or thousands of instances, there are In the business process layer of scheduling instances, the path followed by a single request through the service topology can be very complex. At the same time, with the popularization of container technology, and the container makes each service easy to write and run in another language, the library method becomes stretched at this moment.

This complexity and critical demands make applications increasingly need a dedicated layer for communication between services, which is separate from the application code and can capture the high dynamic resilience of the underlying environment. This layer is the service mesh we need.

Service agents can help us add important functions to the cloud environment service architecture. Each application can have its own requirements or configuration to understand how the agent behaves given its workload target. With more and more applications and services, it can be very difficult to configure and manage a large number of agents. In addition, the use of these agents in each application instance can provide opportunities to build rich advanced functions, otherwise we will have to perform these functions in the application itself.

The service agent forms a meshed data plane, through which the traffic between all services is processed and observed. The data plane is responsible for establishing, protecting and controlling the flow through the grid. The management component responsible for how the data plane is executed is called the control plane. The control plane is the brain of the grid and provides public APIs for grid users to manipulate network behavior.

Istio service grid

Istio is an open platform for connection / management and secure microservices. It provides a simple way to create a microservices grid, and provides load balancing, inter-service authentication, and monitoring capabilities. The key point is to The above functions can be realized without modifying too many services. Istio itself is an open source project that provides a consistent way to connect, harden, manage, and monitor microservices. It was originally an open source implementation of a service network created by Google, IBM, and Lyft. Istio can help you add flexibility and observability to your service architecture in a transparent manner. With Istio, applications do not need to know that they are part of the service mesh. Whenever an application interacts with the outside world, Istio will handle network traffic on behalf of the application. This means that if you are doing microservices, Istio can bring many benefits.

Istio mainly provides the following functions:

  • Traffic management, controlling the traffic and API calls between services, making the calls more reliable and making the network more robust under harsh conditions;
  • Observability, access to dependencies between services, and the flow of service calls, thereby providing the ability to quickly identify problems;
  • Policy execution, control service access strategy, without changing the service itself.

Service identity and security provide verifiable identities for services in the grid and provide the ability to protect service traffic so that they can be circulated on networks with different degrees of trust.

Istio's first production-available version 1.0 was officially released on July 31, 2018, and version 1.1 was released in March 2019. After that, the community released 10 small versions successively within three months in a fast iteration. As of the end of this book, the community has released version 1.4.

Istio's data plane uses Envoy proxies by default and works out of the box to help you configure your application to deploy service proxy instances next to it. Istio's control plane consists of components that provide end users and operations personnel with operations and maintenance APIs, proxy configuration APIs, security settings, policy statements, and more. We will introduce these control plane components in subsequent parts of this book.

Istio was originally built to run on Kubernetes, but the code was implemented from a deployment platform neutral perspective. This means that you can take advantage of Istio-based service meshes on deployment platforms such as Kubernetes, OpenShift, Mesos, and Cloud Foundry, and even deploy Istio environments on virtual machines and physical bare metal. In later chapters, we will show how powerful Istio is for hybrid deployments of cloud portfolios including private data centers. In this book, we will give priority to deploying on Kubernetes, and we will introduce virtual machines and other links in more advanced chapters.

Istio means "set sail" in Greek, and Kubernetes can be translated as "helmsman" or "pilot" in Greek. So from the beginning, Istio expects to work well with Kubernetes to efficiently run a distributed microservices architecture and provide a unified method for security, connection, and monitoring of microservices.

With the service proxy next to each application instance, the application no longer needs a language-specific elastic library to implement functions such as fuse, timeout, retry, service discovery, and load balancing. In addition, the service agent also handles metric collection, distributed tracking, and log collection.

Because the traffic in the service mesh flows through the Istio service proxy, Istio has control points in each application to influence and guide its network behavior. This allows service operators to control routing traffic and achieve fine-grained deployment through canary deployment, dark launch, hierarchical rolling, and A / B testing. We will explore these features in later chapters.

core function

Istio provides many key functions in the service network, including five parts: traffic management, security, observability, platform support, integration and customization.

1. Traffic Management

Through simple rule configuration and traffic routing, Istio can control traffic and API calls between services. Istio simplifies the configuration of service level attributes such as fuses, timeouts, and retries, and can easily set up important tasks such as A / B testing, canary deployment, and phased deployment based on percentage-based traffic splitting.

Istio has a fault recovery function out of the box, you can find the problem before the problem occurs, and optimize the call between services to be more reliable.

2. Security

Istio has powerful security features that allow developers to focus on application-level security. Istio provides the underlying secure communication channel and manages authentication, authorization, and encryption of service communications on a large scale. With Istio, service communication is secure by default, allowing policies to be consistently implemented across multiple protocols and runtimes, and the key thing is that all of these require little or no application changes.

Although Istio has nothing to do with the platform, when it is used with the Kubernetes network strategy, its advantages are greater, including the ability to protect pod-to-pod or service-to-service communication at the network and application layers. Subsequent chapters will describe how to combine network strategies and Istio in Kubernetes to jointly protect services.

3. Observability

Istio has powerful tracking, monitoring, and logging capabilities that can give you an in-depth understanding of service grid deployments. Through Istio's monitoring function, you can truly understand how service performance affects upstream and downstream functions, and its custom dashboard can provide visibility into the performance of all services and let you understand how that performance affects other processes.

Istio ’s Mixer component is responsible for policy control and telemetry collection, provides back-end abstraction and intermediary, isolates the rest of Istio from the implementation details of each back-end infrastructure, and provides operations and maintenance personnel with grid and back-end infrastructure Fine-grained control of all interactions between.

All of these features allow you to more effectively set, monitor, and implement service level target SLO on the service. Of course, the most important thing is to detect and repair problems quickly and effectively.

4. Platform support

Istio is platform independent, and its goal is to run in a variety of environments, including cross-cloud, on-premises, Kubernetes, Mesos, and so on. You can deploy Istio on Kubernetes or Nomad with Consul. Istio currently supports:

  • Services deployed on Kubernetes;
  • Services registered with Consul;
  • Services running on various virtual machines.

5. Integration and customization

Istio's policy implementation components can be extended and customized to integrate with existing solutions such as ACL, logging, monitoring, quotas, and auditing.

In addition, starting from version 1.0, Istio supports configuration distribution based on MCP (Mesh Conf? Iguration Protocol, mesh configuration protocol). By using MCP, you can easily integrate external systems, for example, you can implement the MCP server yourself, and then integrate it into Istio. The MCP server can provide the following two main functions:

  • Connect and monitor the external service registration system to obtain the latest service information (such as Eureka, ZooKeeper and other systems);
  • Convert external service information to Istio
    ServiceEntry and publish it through MCP resources.

Why use Istio

In the process of transforming from a monolithic application to a distributed microservices architecture, developers and operations personnel face many challenges, and Istio can solve these problems. As the scale and complexity increase, service grids are becoming more difficult to understand and manage. Various requirements include service discovery, load balancing, fault recovery, index collection and monitoring, and more complex operations and maintenance, such as A / B testing, Canary release, current limit, access control and end-to-end authentication, etc. Istio provides a complete solution to meet the diverse needs of microservice applications by providing behavioral insight and operation control for the entire service grid.

Istio provides a simple way to establish a network for deployed services. The network has functions such as load balancing, inter-service authentication, and monitoring. It only requires a little or no changes to the service code. If you want the service to support Istio, you only need to deploy a special Sidecar proxy in your environment, use the Istio control plane to configure and manage the proxy, and intercept all network communications between microservices.

In addition, the service-oriented architecture (SOA) Enterprise Service Bus (ESB) has some similarities with the service grid. In the SOA architecture, the ESB is transparent to application services, which means that the application has no Perception. The service grid can get similar behavior. The service grid should be transparent to the application, just like the ESB, to simplify calling between services. Of course, ESB also includes things like relaying of interactive protocols, message conversion, or content-based routing, and the service mesh is not responsible for all the functions that the ESB does. The service grid does provide resiliency for service requests through retry, timeout, and fuse, and also provides services such as service discovery and load balancing. Complex business conversion, business process orchestration, abnormal business processes, and service orchestration capabilities do not belong to the scope of service grid solutions. Compared with the ESB's centralized system, the data plane in the service grid is highly distributed, and its agents and applications coexist, which eliminates the single point of failure bottleneck that often occurs in the ESB architecture.

Of course, it is also necessary to know what problems the service grid does not solve. Service grid technologies like Istio usually provide powerful infrastructure functions that can touch many areas of distributed architecture, but they certainly cannot solve every problem you may encounter. Questions. The ideal cloud architecture can separate different concerns from each layer of implementation.

At the lower level of the infrastructure, more attention is paid to infrastructure and how to provide infrastructure capabilities for automated deployment. This helps deploy code to various platforms, whether it's containers, Kubernetes, or virtual machines. Istio does not limit which automated deployment tool you should use.

At a higher business application level, application business logic is a differentiated asset for companies to maintain their core competitiveness. These code assets involve a single business function and the services that need to be called, in what order, how to perform the interaction of these services, how to aggregate them together, and the operations to be performed when a failure occurs. Istio does not implement or replace any business logic. It does not perform service orchestration itself, nor does it provide content conversion or enhancement of business loads, nor does it split or aggregate the loads. These functions are best left to the libraries and frameworks in the application.

The following diagram is about separation of concerns in cloud-native applications, where Istio supports the application layer and sits above the lower-level deployment layer.
Istio plays the connection role between the deployment platform and the application code. Its role is to facilitate the removal of complex network logic from the application, which can perform content-based routing based on external metadata (such as HTTP headers, etc.) that is part of the request. Fine-grained flow control and routing can also be performed based on matching of services and requested metadata. You can also protect transmission and offload security token verification, or you can implement quotas and usage policies defined by service operators.

5.png

Understanding Istio's capabilities, its similarities to other systems, and its position in the architecture can help us make the same mistakes as the promising technologies we may have encountered in the past.

Maturity and support level

The Istio community has proposed different functional stage definitions for the relative maturity and support level of each component function. Alpha, Beta, and Stable are used to describe their respective states, as shown in Table 1-1.

6.png

Table 1-2 is a list of the existing functions of the Istio 1.4 version we have extracted that have reached the Beta and Stable functional stages. This information will be updated after each release, you can refer to the official website for the update status.

7.png
8.png

Of course, Istio still has some functions that are still in Alpha state, such as the Istio CNI plug-in, which can replace the istio-init container to complete the same network function, and does not require Istio users to apply for additional Kubernetes RBAC authorization; and the use of custom filters in Envoy Ability etc. It is believed that in the subsequent continuous improvement, these functions will gradually become more and more stable and available for production.

to sum up

Istio is currently the most popular implementation in the service grid field of the industry, and its features allow you to simplify the operation and operation of cloud-native service architecture applications in a hybrid environment. Istio allows developers to focus on using their favorite programming language to build service functions, which effectively improves the developer's productivity, and at the same time developers are free from mixing code that solves distributed system problems into business code.

Istio is a completely open development project with a vibrant, open and diverse community. Its goal is to empower developers and operations personnel so that they can agilely release and maintain microservices in all environments Have complete visibility of the underlying network and gain consistent control and security capabilities. In subsequent parts of this book, we will show how to use Istio's capabilities to run microservices in the cloud-native world.

This article is excerpted from "Istio Service Grid Analysis and Actual Combat" and published with permission from the publisher. This book was written by Alibaba Cloud senior technical expert Wang Xining. It introduces the basic principles of Istio and the actual development in detail. It contains a large number of selected cases and reference codes that can be downloaded to quickly get started with Istio development. Gartner believes that the service grid will become the standard technology for all leading container management systems in 2020. This book is suitable for all readers who are interested in microservices and cloud native. It is recommended that you read this book in depth.

9.png

Recommended reading:
[1] Alibaba Cloud Service Grid ASM public beta attack series one: Quickly understand what is the ASM
article link: https://yq.aliyun.com/articles/748761
[2] Alibaba Cloud Service Grid ASM Expandability (1): Add HTTP request header through EnvoyFilter in ASM
Article link: https://yq.aliyun.com/articles/748807

" Alibaba Cloud Native focuses on microservices, serverless, containers, service mesh and other technical fields, focuses on cloud native popular technology trends, cloud native large-scale landing practices, and is the public number that understands cloud native developers best."

Original link
This article is the original content of the Yunqi community and may not be reproduced without permission.

Published 2315 original articles · 2062 thumbs up · 1.58 million views

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/105437501