Talking about Microservices and Open Source in the Cloud Native Era from CloudWeGo

This article is compiled from the speech sharing of the DIVE Global Basic Software Innovation Conference 2022, with the theme of "Talking about Microservices and Open Source in the Cloud Native Era from CloudWeGo".

1. Thinking and Philosophy of Project Creation 

Our team is often asked, why did you create a new project? I think it's a philosophical question.

Throughout the entire open source community, various projects are repeatedly created in each time period. Most of these projects disappear quickly, and only a few projects can survive. When bystanders see such a scene, gradually, more and more people stay in the project search and give up the opportunity to become project creators. Over time, we began to worry about whether there will be new projects for the next generation to use? Is it possible that in the future, in the same field, one project can unify the entire market?

In fact, in the programmer's world, there is no shame in creating new projects by referring to old projects. Creation means not only thinking, weighing and designing, but also the special and difference of our contribution project. There are a lot of up-and-comers out there, and it's them who contribute to the diversity of the open source community. "Every line of code is a careful design" is our best compliment to a great creator. And a good code design often contains two most basic characteristics: correctness and maintainability. At the same time, these two characteristics correspond to two different personalities.

The first personality, the designer and the implementer, is relatively simple to control. As long as the function is realized, the test is passed, and the operation is correct. However, the second personality, readers and maintainers, requires higher code quality, clearer code structure and better extensibility. Only with these two personalities at the same time, developers can create an excellent project with ease.

What does it mean that great projects are created? Thousands of users can evaluate and use it. This also shows from the side that open source itself can avoid more projects being created repeatedly.

2. Introduction to CloudWeGo 

CloudWeGo is an open source project of the ByteDance infrastructure team. It is a set of middleware that can quickly build an enterprise-level cloud-native architecture. It focuses on microservice communication and governance, and features high performance, scalability, and reliability . , and focus on ease of use .

CloudWeGo open sourced four projects in the first phase:

  • Kitex : High-performance, highly extensible Golang RPC framework;

  • Netpoll : high-performance, I/O non-blocking, network framework focused on RPC scenarios;

  • Thriftgo : Thrift compiler implemented by Golang, supports plugin mechanism and semantic checking;

  • Netpoll-http2 : An implementation of HTTP/2 based on Netpoll.

In addition to these major projects, CloudWeGo has open sourced projects such as Kitex-benchmark / Netpoll-benchmark / Thrift-gen-validatorKitex-examples  / Netpoll-examples .

Given the limited space of the article, the following will focus on the CloudWeGo core project Kitex .

From the perspective of evolution history , in 2014, the ByteDance technical team introduced Golang to solve the high concurrency problem faced by the long-connection push business. Two years later, the internal technical team launched a framework called Kite based on Golang. At the same time, the open source project Gin Made a very thin package and launched Ginex. These two frameworks have greatly promoted the use of Golang within the company. After that, ByteDance restructured Kite around performance and scalability design, and completed and released Kitex in October of the following year, and put it into internal applications. It is reported that as of September 2021, there are 3w+ microservices using Kitex online, and most services can reap the benefits of CPU and latency after migrating to the new framework.

From an architectural point of view, Kitex is mainly divided into two parts. Among them, Kitex Core is its backbone logic, which defines the framework's hierarchy, interface, and default implementation of the interface. The top Client and Server are exposed to the user, including the Option configuration and its initialization logic; the Modules module in the middle is the functional module and interactive meta-information at the framework management level, and the Remote module is the module that interacts with the peer, including codec and network communication. The other part of Kitex Tool corresponds to the implementation related to the generated code. The generated code tool is obtained by compiling this package, which includes IDL parsing, verification, code generation, plug-in support, self-update and so on.

From the perspective of functions and features , it can be mainly divided into the following seven aspects:

  • High performance: The network transmission module Kitex integrates the self-developed network library Netpoll by default, which has significant performance advantages over using Go net; in addition to the performance benefits brought by the network library, Kitex also deeply optimizes the Thrift codec. For performance data, please refer to Kitex-benchmark .

  • Scalability: Kitex is divided into modules in design, providing more extension interfaces and default extension implementations. Users can also customize extensions according to their needs. For more extension capabilities, please refer to the CloudWeGo official website documentation . Kitex is also not coupled to Netpoll, and developers can also choose to use other network library extensions.

  • Message Protocol: The RPC message protocol supports Thrift, Kitex Protobuf, and gRPC by default. Thrift supports Buffered and Framed binary protocols; Kitex Protobuf is Kitex's custom Protobuf message protocol, the protocol format is similar to Thrift; gRPC supports gRPC message protocol and can communicate with gRPC. In addition, users can also extend their own message protocol.

  • Transmission protocol: The transmission protocol encapsulates the message protocol for RPC intercommunication. The transmission protocol can additionally transparently transmit meta information for service management. The transmission protocols supported by Kitex include TTHeader and HTTP2. TTHeader can be used in combination with Thrift and Kitex Protobuf; HTTP2 is currently mainly used in conjunction with the gRPC protocol, and Thrift will be supported in the future.

  • Multiple message types: support PingPong, Oneway, and two-way Streaming. Among them, Oneway currently only supports the Thrift protocol, and two-way Streaming only supports gRPC, and will consider supporting Thrift's two-way Streaming in the future.

  • Service Governance: Supports service management modules such as service registration/discovery, load balancing, circuit breaker, current limiting, retry, monitoring, link tracking, logging, and diagnosis. Most of them have provided default extensions, and users can choose to integrate them.

  • Kitex has built-in code generation tools that support the generation of Thrift, Protobuf, and scaffolding code. The native Thrift code is generated by Thriftgo, which is open sourced together this time. Kitex's optimization of Thrift is supported by Kitex Tool as a plug-in. The Protobuf code is generated by Kitex as the official Protoc plug-in. Currently, the parsing and code generation of Protobuf IDL is not supported separately. 

To sum up briefly, CloudWeGo is not just an open source project, but a real, hyperscale enterprise-level best practice. It originates from the enterprise, so it is naturally suitable for implementation within the enterprise; it originates from open source, and finally embraces open source, from the Go basic library, to the Go network library and Thrift compiler, to the upper-level service framework, and the framework owns All enterprise-level governance capabilities are open and open source.

3. CloudWeGo's Microservice Governance

Microservice architecture is a technology hotspot in the current software development field. Large systems will eventually be dismantled into small systems. "A long period of time must be divided, and divided and conquered." Most of the system architectures in traditional industries are huge monolithic architectures. Microservices are a very natural evolution state in the process of architecture development.

So, what is microservice governance? There are different opinions, and there is no consensus in the industry. In a broad sense, service governance focuses on elements related to the service life cycle, including service architecture design, application release, registration discovery, traffic management, monitoring and observability, fault location, security, etc.; or it can be divided into architecture governance, research and development Governance, test governance, operation and maintenance governance, management governance. In a narrow sense, service governance techniques include service registration and discovery, observability, traffic management, security, and control. The follow-up is mainly from the perspective of service governance in a narrow sense, and introduces the thinking and exploration of CloudWeGo-Kitex.

Service registration and discovery

Kitex does not provide default service registration discovery, reflecting the neutrality of the framework . Kitex supports custom registration modules and discovery modules. Users can extend and integrate other registration centers and service discovery implementations. The extension is defined under Pkg/Registry and Pkg/Discovery respectively.

The Kitex service registration extension interface is shown below. For more details, please refer to the official website framework extension -> service registration extension .

The Kitex service discovery extension interface is shown below. For more details, please refer to the official website Framework Extension ->  Service Discovery Extension .

As of now, Kitex has completed various service discovery modes of ETCD, ZooKeeper, Eureka, Consul, Nacos, and Polaris through the support of community developers. Of course, it also supports DNS resolution and Static IP direct access mode, establishing a powerful and complete system. The community ecology is available for users to choose flexibly according to their needs.

Special thanks to several community contributors including @li-jin-gou @liu-song @baiyutang @duduainankai @horizonzy @Hanson for their implementation and support of the above service discovery extension library. For more code details, see  kitex-contrib (github.com) .

fuse

The Kitex service registration and discovery mechanism was introduced earlier, which is very important for the business access framework. Without this link, microservices cannot communicate with each other. So what does circuit breaker do for microservices?

When microservices make RPC calls, downstream services will inevitably make mistakes. When there is a problem downstream, if the upstream continues to call it, it will not only hinder the downstream recovery, but also waste upstream resources. In order to solve this problem, you can set some dynamic switches to manually close the call to the downstream when the downstream fails, but a better way is to use a circuit breaker to solve this problem automatically.

Kitex provides the implementation of the fuse, but it is not enabled by default, and it can be used after the user has actively enabled it.

Most of Kitex's service management modules are integrated through Middleware, and the same is true for circuit breakers. Kitex provides a set of CBSuites that encapsulate service-level circuit breakers and instance-level circuit breakers.

  • Service granularity circuit breaker: fuse circuit breaker statistics according to service granularity, and add it through WithMiddleware. The specific division of the service granularity depends on the Circuit Breaker Key, which is the key of the fuse statistics. GenServiceCBKeyFunc needs to be passed in when CBSuite is initialized .  The default provided is  circuitbreaker.RPCInfo2Key , the format of the key is fromServiceName/toServiceName/method , that is, the fuse statistics are made according to the method-level exception. 

  • Instance-level fuse: The fuse statistics are performed according to the instance granularity, which is mainly used to solve the problem of single-instance exceptions. If an instance-level fuse is triggered, the framework will automatically retry.

The idea of ​​circuit breakers is simple: restrict access downstream based on the success or failure of the RPC. Usually fuses are divided into three periods: CLOSED, OPEN, HALFOPEN. When the RPC is normal, it is CLOSED; when the RPC errors increase, the fuse will be triggered and enter OPEN; after a certain cooling time after OPEN, the fuse will become HALFOPEN; when HALFOPEN, some strategic access will be performed downstream, and then According to the result, decide whether to become CLOSED or OPEN. In general, the transitions of the three states are roughly as follows:

For more details and principles of Kitex fuse implementation, you can check the official website Basic Features ->  Fuse chapter.

Limiting

If the circuit breaker is to protect the call chain from the client to prevent the system from avalanche, then the current limit is a measure to protect the server to prevent the server from overloading due to a sudden increase in the upstream traffic of a client.

Kitex supports limiting the maximum number of connections and maximum QPS. When initializing the Server, add an Option:

Among them,  MaxConnections  represents the maximum number of connections, and MaxQPS  represents the maximum QPS. In addition, Kitex also provides the ability to dynamically modify the current limiting threshold.

Kitex uses ConcurrencyLimiter and RateLimiter respectively to limit the maximum number of connections and maximum QPS. ConcurrencyLimiter uses a simple counter algorithm, and RateLimiter uses a "token bucket algorithm".

The monitoring of the current limit status is also an important part. Kitex defines the LimitReporter interface for monitoring the current limit status, such as the current number of connections is too large, the QPS is too large, etc. If required, the user needs to implement this interface and inject it through WithLimitReporter .    

request retry

Kitex provides three types of retry: timeout retry, Backup Request, and connection failure retry. The failure to establish a connection is a network-level problem. Since the request is not sent, the framework will retry by default. The following focuses on the use of the first two types of retries. It should be noted that because many service requests are not idempotent , these two types of retries will not be used as default policies, and users need to enable them on demand.

  • Timeout retry: A type of error retry, that is, when the client receives a timeout error, it initiates a retry request.

  • Backup Request: The client has not received a response within a period of time, and initiates a retry request. Any request is successful. The waiting time RetryDelay of Backup Request is recommended to be set to TP99, which is generally much smaller than the configured timeout time Timeout.

The long-tail request in the service increases the overall delay of the service, and the proportion of long-tail requests is very low. As shown in the figure above, the delay distribution of a real service can clearly see the long-tail phenomenon. The maximum delay is 60ms, while 99% of the services are delayed. can return in 13ms. When the request delay reaches 13ms, it has entered the long-tail request. At this time, we can send another request. This request will be returned within 13ms with a high probability. If any request returns, we consider the request to be successful, that is, by increasing the appropriate load. , greatly reducing the fluctuation of response time. For the advantages and disadvantages of timeout retry and Backup Request, as well as applicable scenarios, see the following table:

load balancing

Kitex provides two load balancing algorithm implementations by default:

  • WeightedRandom: This algorithm uses a weighted random strategy, which is Kitex's default strategy. It will be weighted randomly according to the weight of the instance, and ensure that the load allocated to each instance is proportional to its own weight .

  • ConsistentHash: Consistent hashing is mainly suitable for scenarios that are highly dependent on context (such as instance local cache). If you want requests of the same type to hit the same machine, you can use this load balancing method.

When using ConsistentHash, you need to pay attention to the following:

  • When the downstream node changes, the consistent hash result may change, and some keys may change;

  • If there are many downstream nodes, the build time may be longer at the first cold start, and if the RPC timeout is short, it may cause a timeout;

  • If the first request fails, and Replica is not 0, it will be requested to Replica; the second and subsequent times will still request the first instance.

observability

The framework itself does not provide the implementation of monitoring and management, but provides the  Tracer  interface. Users can implement this interface according to their needs and inject them into the framework through the  WithTracer  Option.

Kitex's monitoring and management, Metrics reporting and link tracking can all be extended through the above interfaces.

Currently , the Kitex-contrib organization provides the monitoring extension of Prometheus , the link tracking extension of OpenTracing , and the extension implementation of the OpenTelemetry observability family bucket (Metrics + Tracing + Logging). Users can access the corresponding extensions as needed.

Microservice Framework and Service Mesh

The service framework is the core of traditional microservice technology. Service registration, discovery, invocation, governance, and observation in early microservice technology are inseparable from the service framework. This also brings some problems, such as the need for business developers to perceive and use the service governance capabilities of the service framework, the difficulty of upgrading the framework version, and the increasingly heavy and difficult maintenance of the framework.

Service Mesh is a more in-depth microservice architecture solution that defines non-intrusive service governance, and is called the second-generation microservice architecture. By integrating the microservice governance capabilities as independent components (Sidecars) and sinking them into the infrastructure, the service mesh can realize the complete separation of application business logic and service governance logic, which also makes it possible to support high-level features such as multi-language and hot upgrades . It's a matter of course.

Entering the cloud-native era, with the gradual development of service grid technology, we also need to use a development perspective to plan and design the architecture. The microservice framework and service grid will coexist in the future and form a unified service governance system. In ByteDance, the service governance system consists of service framework and service governance. Taking the Golang service as an example, CloudWeGo provides service governance with strong business correlation and strong intrusion, and Byte Service Mesh provides service governance with weak business correlation and weak intrusion. They cooperate with each other and negotiate with each other, which not only solves the scaffolding and development required for business development mode, which makes the access of service governance easier.

At the same time, in a scenario where a service mesh and a service framework are used at the same time, the service framework must support flexible offloading of governance capabilities, and the service mesh also needs to ensure functional stability. In the direction of future technology evolution, the service framework also mainly focuses on encoding and decoding, communication efficiency, multi-protocol support, etc., while the service mesh can go deep into the research and development of more non-intrusive service governance functions.

In addition, in large-scale scenarios, we often need to consider the following factors for decision-making of R&D requirements for new functions of service governance:

  • Performance:  Most businesses care about it, and it is also the focus of the team's efforts ;

  • Universal :  need to assess whether it is a capability required by all businesses ;

  • Concise:  In layman's terms, we don't want to introduce too many online problems or too complicated documentation;

  • ROI: Functional iterations and product upgrades need to consider the overall return on investment.

4. The Open Source Road of CloudWeGo

The internal version of Kitex of Bytes is dependent on the open source version of Kitex, so it can be understood that Kitex has the same origin inside and outside, and there are no two Kitex .

Reason for open source

Back to the question at the beginning, why create a new project and open source CloudWeGo?

First of all, the projects in CloudWeGo have been verified by large-scale implementation practice within Byte, and the iteration of each function after open source is also verified by internal use for the first time. It is a real enterprise-level landing project, open source Users and Byte’s internal business use the same set of service frameworks; secondly, the functions provided by CloudWeGo, especially protocol support and service governance, can solve real business pain points, and every line of code optimization can actually improve users Service performance; Finally, the development of CloudWeGo also draws on the design ideas of some well-known open source projects, and also relies on the implementation of some open source projects. We open source CloudWeGo to give back to the community and contribute to the open source community.

At the beginning of the design of CloudWeGo, both correctness and maintainability were considered at the same time. In addition to the correctness of code logic, high-quality code, clear code structure and excellent extensibility have always been the direction and practice creed pursued by CloudWeGo.

CloudWeGo serves users and is demand-driven, providing users with out-of-the-box service frameworks and related middleware, hoping to serve more enterprises and independent developers and avoid repeated creation by users.

The history of open source

CloudWeGo has been officially announced since September 8, 2021. The main sub-project Kitex has successively released v0.1.0 and v0.2.0, which supports many new functions, and has also made many optimizations to performance, code, and documentation. As of April 2022, 7 months after the first official announcement, CloudWeGo-Kitex alone has harvested  4,000  Stars and accumulated nearly 50  Contributors, reaching a new milestone, which is very interesting and very exciting, isn't it? ? 

The CloudWeGo team has attached great importance to community building since the beginning of open source, and " Community Over Code " is also the culture and goal followed by the CloudWeGo community.

From building user groups, building official website and documentation, actively maintaining project Issues, handling new PRs in a timely manner, to our in-depth communication with contributors and cultivating them, every action reflects our determination. In order to promote the normalization and standardization of community construction, the CloudWeGo team has successively created the Community repository to define the promotion mechanism of community members and archive community materials.

In order to practice the open source culture of openness, transparency, open source and open source, and build an open dialogue and communication platform, CloudWeGo organizes bi-weekly community meetings, at which the community's recent plans are synchronized, and suggestions from community members are actively listened to, and relevant technologies are discussed with community contributors program implementation.

Up to now, through the cultivation of Community Maintainers, the active application of Contributors, and the voting and approval of the Community Management Committee, 5 Committers have been officially approved to join the application, which has greatly strengthened the core strength of the CloudWeGo community, and they have made significant contributions to the development of the community. .

follow-up planning

CloudWeGo will be included in CNCF Landscape at the end of 2021, enriching the ecology of CNCF in the RPC field, and providing global users with a new set of choices when making technology selections.

Although some small achievements have been made, CloudWeGo is still a young project. Open source is more important for perseverance and long-term construction. The CloudWeGo team will continue to improve and move forward.

From the perspective of community building, the CloudWeGo team will continue to provide more newcomer-friendly Good-first-issues, insist on organizing regular community meetings, regularly hold open source technology salons, and provide easier-to-understand technical documents. of developers participate in community building.

From the perspective of open source planning, the open source of the HTTP framework Hertz is imminent, and more middleware gadgets and extension libraries are also being continuously open sourced. In addition, the main creative team of CloudWeGo has also developed a Rust RPC framework, which is being implemented and verified internally. In the near future, it will also be open sourced to the outside world.

From the perspective of the function R&D plan, taking CloudWeGo-Kitex as an example, it will continue to develop new functions and iteratively improve existing functions based on the needs of internal and external users. Among them, including support for connection warm-up, custom exception retry, performance optimization for Protobuf support, support for xDS protocol, etc.

From the perspective of open source ecology, CloudWeGo-Kitex has completed the docking of many open source projects, and will support more open source ecosystems as needed in the future. In addition, CloudWeGo is also cooperating with mainstream public cloud manufacturers at home and abroad to provide a base for out-of-the-box, stable and reliable microservice hosting and governance products; CloudWeGo also actively cooperates and communicates with domestic and foreign software foundations to explore New cooperative mode.

The future of CloudWeGo is promising. We expect more users to use our project and more developers to join the CloudWeGo community to witness a nascent but amazing microservice middleware and open source project in the cloud native era.

More information

Upcoming Events

In May, the first source code interpretation activity of the CloudWeGo community has been launched, and you are welcome to continue to pay attention and actively participate!

Address: [CSG Phase 1] Kitex Source Code Analysis Learning Activity Issue #24 cloudwego/community (github.com)

 

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4843764/blog/5532121