Depth | ByteDance Microservice Architecture System Evolution

This article is compiled from the sharing by Cheng Guozhu, head of the infrastructure/service framework team of ByteDance (Volcano Engine), at QCon 2021. It mainly introduces the technical practices and technical practices of the service framework team on the Golang service framework and Service Mesh from 2018 to 2021. Summary of experience.

ByteDance Microservice Architecture Overview 

In ByteDance, the characteristics of the microservice architecture can be summarized into 4 points, as shown in the following figure:

The first is large scale and rapid growth . In the past three years, the number and scale of ByteDance's microservices have ushered in rapid development. In 2018, the number of our online microservices was about 7,000-8,000, and by May this year, this number had exceeded 50,000. With the rapid growth, the service framework team also encountered many challenges.

The second is full containerization and PaaS . More than 90% of ByteDance's online microservices run in containers. All rollouts are carried out through a PaaS-based platform, which means that there will be no physical machine deployment model online. This approach has some challenges: it increases scheduling complexity; it also brings some convenience: it is conducive to the promotion of new functions.

Third, the technical system of ByteDance is dominated by the Golang language . According to the latest survey statistics, more than 55% of the company's services use Golang, the second language is NodeJS, followed by Python, JAVA, C++, and Rust is also used.

Finally, Service Mesh has already fully landed in ByteDance. The details of this piece will be expanded in the third section of this article. 

Based on the above four characteristics, the main challenges encountered by the current ByteDance microservice architecture are still around R&D efficiency, operational efficiency and stability. Among them, R&D efficiency and stability are encountered by almost all Internet companies: multi-language, ease of use, performance, cost... Among these issues, the ByteDance service framework team and the volcano engine cloud native team are most concerned about the following Three:

  • Iterate quickly. R&D and launch must be fast;

  • Support for multiple languages ​​should be good enough. In line with the growth of employee scale, it is necessary to maintain a very tolerant attitude towards multilingualism;

  • Runtime stability.

The evolution of the Golang microservice framework

In 2014, Golang was introduced into ByteDance to quickly solve the high concurrency problem faced by the long connection push business. By 2016, the technical team launched a framework called Kite based on Golang, and at the same time made a thin layer of packaging for the open source project Gin and launched Ginex. The launch of these two original frameworks has greatly promoted the application of Golang within the company.

This continued until mid-2019. At the beginning of the release of Kite and Ginex, due to the low version of many functions, including Thrift, which only had v0.9.2 at the time, they actually had many problems. In addition, Golang ushered in several rounds of large version iterations, and Kite did not even have the golang context parameter. For various reasons, Kite has been unable to meet the needs of internal use.

In mid-2019, the service framework team officially launched the refactoring of Kite, the byte's own RPC framework. This is a bottom-up overall upgrade and reconstruction, designed around performance and scalability requirements. In October 2020, the team completed the release of KiteX , and just two months later, KiteX has been connected to more than 1,000 services.

Similar design ideas and underlying modules are also applied to ByteDance’s self-developed Golang HTTP framework, Hertz. The peak service QPS carried by the project on the 2021 Spring Festival Gala exceeded 1000w (physical machine deployment services were not counted), and there was no abnormality online. feedback.

RPC framework KiteX

KiteX is the next-generation high-performance, highly scalable Golang RPC framework developed by ByteDance. In addition to rich service governance features, it also integrates the self-developed network library Netpoll , supports multi-message protocols (Thrift/Protobuf) and multi-interaction methods (Ping-Pong/Oneway/Streaming), and provides self-developed, more flexible and flexible Extended code generator.

The following figure is the architecture diagram of KiteX , and the left side is some of its core features.

Kite used the native library at the beginning of its release, so there are two main problems: one is that the peer connection status cannot be directly sensed; the other is that the native net network library is prone to goroutine explosion when faced with long connections. .

This is actually quite common in HTTP scenarios, where developers often encounter an HTTP service (including when using a framework like Gin) where the memory continues to grow. Why? Because the client often forgets to close the connection in the actual scene.

Netpoll can solve the problem of connection explosion through a React model, and also bring some use of zero-copy buffer.

Regarding multi-protocol, most of ByteDance's services use the Thrift protocol. The multi-protocol here is to deal with the needs of some businesses to support other protocols. For example, the back-end service and the Protobuf on the end are connected. It is decoupled from Thrift through multi-protocol support. Now KiteX supports both PB and Thrift protocols, and also supports flexible custom protocol extensions . 

High performance is the common goal of all framework developers, so I won't go into details here. Finally, KiteX is designed with the open source community in mind . As shown in the figure above, KiteX Core is the part that can be directly open-sourced. KiteX Byted on the right involves ByteDance's internal tools, which can be replaced with some open-source log libraries and solutions for external use.

As of May 2021, more than 20,000 services in the company are using KiteX, and it also supports requirements such as streaming/generalization calls. In terms of open source, the technical team also provides some pluggable features for monitoring and logging.

In terms of performance, kitex/thrift has achieved a 2.5-fold increase in throughput compared to the current community-approved gRPC framework, and kitex/protobuf has also achieved a nearly 2-fold increase. KiteX also has a good performance on TP99 latency.

HTTP Framework Hertz

In October 2020, ByteDance launched Hertz internally as the preferred framework for Golang HTTP instead of Ginex (Gin packaged version).

Hertz's design ideas are roughly the same as KiteX. It draws on the advantages of open source projects Gin and FastHTTP, and introduces technologies such as Netpoll, memory pool, and zero copy, so it has high performance. Under the same configuration, Hertz's limit QPS is twice that of Ginex, and the average latency is only half that of Ginex.

The picture below is the Benchmark made by the technical team based on the latest version. In the small and medium-package scenarios, Hertz has a QPS improvement of 30% to 50% compared to FastHTTP; in the large-package scenario, the difference between the two sides is not large, and Hertz has the advantage in average latency.

json library Sonic

Before the Spring Festival in 2021, the technical team conducted a performance bottleneck analysis of the top 50 services in the number of online containers. Through research, the team found that json accounts for 9.5% of the overall resources, and even 40% of the resources in some services!

Since the performance of json is so poor, why not replace it with Protobuf?

At that time, the technical team did cut some services into Protobuf, but considering the ease of use and popularity of json, as well as the migration cost and migration risk caused by changing the protocol, they also began to try to make some optimizations. They investigated all available Golang json libraries, and made a detailed comparison and analysis in business scenarios, found some problems and optimized them:

  • JIT : Some json libraries have begun to exhibit JIT characteristics, but they still stop at organizing common code fragments through aggregation, and have not achieved the ultimate just in time compilation, so they need to go a step further;

  • Pre-screening + SIMD : simdjson performs very well in some large-package scenarios, but it does not perform well in some small-package scenarios, so pre-screening and SIMD optimization are required;

  • asm2asm : C++ -> Go - "The best way to optimize Go is not to use Go", the technical team found that the common functions of json are written in C++, compiled into x86 assembly, and then converted into Go assembly through the internal asm2asm tool, A very large performance improvement can be obtained;

  • Lazy-load parser : A parser is made for multi-key search scenarios.

Based on the above optimization, in a 110KB real business request scenario, compared to other json libraries, Sonic 's performance in each scenario has been doubled. At present , Sonic has been open sourced, but after internal service testing, it is still in a relatively early stage, and there are some stability issues, and relevant teams will continue to follow up.

ByteDance's Service Mesh

First a few numbers. Since June 2018, ByteDance has launched about 30,000 services in three years , and the number of containers managed by Service Mesh has exceeded 3 million . All business scenarios, as well as ToB and edge computing scenarios, are now fully covered by Mesh.   

The following figure is a schematic diagram of the company's internal Service Mesh architecture. In addition to the data plane and control plane, it also has an operation and maintenance control plane (Operation Plane), which has two outstanding features:

  • On the data plane, ByteDance 's Service Mesh  realizes the capability of middleware sidecar , forming a standard mode . The general sidecar in the figure below is the standard technical solution;

  • The operation and maintenance control plane of the  Service Mesh  can publish various types of resources . Any resources that need to be released centrally, such as Mesh's sidecar, such as WebAssembly resources, and dynamic libraries, can be released through the operation and maintenance control plane.

Main Features of ByteDance's Internal  Service Mesh

The characteristics of the current ByteDance  Service Mesh  can be summarized in four keywords:

Fully functional . In addition to the RPC framework and HTTP framework mentioned above , ByteDance 's Service Mesh  has provided comprehensive support for middleware, MySQL, MongoDB, Redis, RocketMQ, etc., in terms of security capabilities and service governance capabilities, including traffic replication, mocking, and disaster recovery. and so on, it can provide full functionality.

multiple scenes . ByteDance 's Service Mesh  is suitable for intranet environments, edges, and can also be used to connect two IDCs in series. Why is IDC series connected through Mesh? Because the network across the IDC is unstable, considering the high cost and strict service access control, it is necessary to provide strong edge control capabilities through Mesh.

stability . This is a relatively common topic and will not be expanded here.

High performance . The performance optimization of ByteDance  Service Mesh  is based on the concept of the technical team: if the goal is to make a true zero copy proxy, and we can't do it, what is the reason for it? Networking and kernel, base libraries, component architecture, compilation - if the blockage comes from the kernel, change the kernel's API, such as reducing sendfile overhead. Around this concept, the technical team brought 1%-2% performance improvement by adopting Facebook's hashmap; achieved 35%-50% throughput improvement by rewriting the abstraction layer; through full static compilation, no need to modify any code, you get a performance improvement of about 2%...

The following figure shows the main ideas and measures of the technical team in performance optimization:

Performance: IPC based on shared memory

As mentioned above, what the technical team wants to achieve is a true zero copy proxy. So where does the copy happen?

The above picture is the most primitive state. The mesh proxy brings two layers of copy, one is to copy the business process to the Buffer of the Unix Domain Socket, and the other is to read it out. In addition, it has one more process, and one more process means increased scheduling overhead and some replication costs.

In response to this problem, the approach adopted by the technical team is to write it into the kernel only once . Through a shared memory IPC, after the business process writes the data to be sent, the mesh proxy can determine any scheduling policy and governance policy based on the meta header, and send the data by calling the downstream sending function.

But now TCP Socket cannot send data directly, what should I do? Ask the team responsible for the kernel to write an API. In this way, the entire overhead can be reduced to a minimum value.

The figure above is the performance test result using shared memory IPC. Since the beginning of 2020, ByteDance has started to follow the path of shared memory for data, Unix Domain Socket  for control signals, and a good control protocol at the same time. It is true that this road can be continued. Currently, there are 500+ services that are grayscale online, and the overall stability is good. In the future, the technical team also plans to extend this optimization method to all scenarios involving same-machine communication .

In addition, defining a common protocol enables multiplexing of this capability. As the communication between containers becomes more and more frequent, the importance of this kind of communication based on shared memory will become increasingly prominent. The general sidecar of ByteDance's internal  Service Mesh  is also based on the same thinking.

Observability: The Pain of Service Mesh

Infrastructure departments are often asked two questions:

  • Q1: Inconsistent upstream and downstream latency - why do I see a latency of 100 milliseconds on the client side, but only 50 milliseconds for server-side processing, where did the 50 milliseconds in the middle go?

  • Q2: Request timed out - Why does my request time out after connecting to Mesh (it's the same without connecting to Mesh)?

If you think deeply about this problem, you can find that the fundamental reason is that the time of a request can be divided into the visible part of the business and the invisible part of the business.

For example, it takes 1 millisecond for developers to perform encryption once, but it may take 10 milliseconds to call a service to perform encryption once. Where did those 9 milliseconds go? Many developers will think it's an environment problem, or an infrastructure problem. This problem has always existed and is fairly common.

So, where does the delay come from?

  • Calculate . For example, serialization, upstream does not actually care about serialization and deserialization of requests, and copying from user state to kernel state, all of which can be classified into computing, and these delays are not perceived by the business side (this It is necessary to distinguish between perception and observability. You can see this part of the time-consuming through monitoring, but the general business does not understand the calculation time-consuming here, and thinks that the processing time is only its own business logic);

  • schedule . For example, if you want to send something asynchronously, after the data is written, the upstream does not know when it will be sent. If the machine load is very high, it may need to wait for 100 milliseconds, or even the task will be throttled by the CPU in the container. At this time, from the upstream point of view, the request times out;

  • network . After a timeout occurs, the network is always the first check item: is it down? Is it slower?

After analyzing these three problems, how should we solve them? The practice of the service framework team is to promote LLT (Low Level Tracing) . That is, visualize all the places that are invisible to the upstream: if the calculation time is not visible, the calculation time is displayed; if the scheduling situation is not perceived, the scheduling time is calculated; if the network situation is unclear, the network The time is also calculated. After the kernel network and scheduling events are collected, these times will eventually be imported into OpenTracing for centralized display.

As shown in the figure above, when the upstream starts to call RPC Client write(), it has actually entered the invisible scope of the business.

ByteDance has a tracing system based on eBPF, which can trace according to the provided log ID, and record time stamps such as when the request enters the kernel and when it is sent from the network card. With this information, the technical team can see through OpenTracing when the request is sent from the local machine, when it is received by the peer, when it starts to enter the mesh processing, when it starts to enter the business layer processing... The whole link becomes very clear.

The CPU overhead of the above solution is about 8% - 10%, and the overall overhead is slightly heavy, so it is currently enabled on demand.

Inner Mesh of Bytes: The Emergence of a Universal Sidecar

The emergence of Service Mesh stems from the technical community's desire to use it to solve problems such as multilingualism, operation and maintenance, and iteration. This concept can be generalized, that is, whether it is possible to create a set of standard technical solutions for solving multiple problems and operation and maintenance problems . There is no doubt that all middleware is in this category, whether it is MySQL, Redis Client, or API Gateway, login components, and risk control components, they all belong to this field.

At the same time, the capabilities behind Service Mesh can also be reused: since the sidecar of the mesh can be distributed to any container on the line, and the safe hot upgrade can be completed, the same method can also be used for the sidecar of the middleware.

ByteDance 's Service Mesh  provides general sidecars, and distributes API Gateway sidecars, risk control sidecars, login and Session sidecars to containers in the form of a mesh. Doing this will not have any impact on the business side, but the consequence is that they also need to deal with performance, observability, stability and other issues. Since Service Mesh has landed on a large scale in ByteDance, the solutions to these problems are actually not much different from mesh proxy.

Summary and Outlook

When it comes to microservices, due to the lack of intuitive feeling, many people tend to only focus on the functional level of services, while ignoring the underlying operation and maintenance attributes. But in fact, why the community is still discussing microservices is because it is deployed in uncertain environments such as public clouds, private clouds, and hybrid clouds, and it solves various complex problems caused by it.

But from another perspective, if the developers all use the same language, there is no cross-machine and cross-process communication. When all services are deployed on the same host, problems such as cross-network communication and request timeout will be greatly reduced. By means of shared memory or unserialization, the overall performance seems to be the most extreme improvement.

Does this break the concept of microservices? actually not. Such deployment is transparent to the business. As long as the host specifications are sufficient, developers can aggregate by service domain through scheduling. After the framework layer and the traffic scheduling layer perceive the situation, they will naturally switch remote communication to local communication. , which greatly optimizes performance and reduces latency. If the machine fails at the host level, all services will indeed fail together, but if only one module fails, it can easily fallback to other services.

This is also an ongoing exploration within ByteDance, in order to solve the problem that non-business consumption is higher than business consumption caused by the increasing number and complexity of microservices .

In the field of microservices, in addition to internal exploration, ByteDance is also trying to make technical output through the service framework team and the volcano engine cloud native team, and provide more enterprises with solutions in microservices and service mesh through the ToB brand volcano engine .

References

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4843764/blog/5516436