Breakthrough performance limits, Ali cloud dragon paper Reading the latest ASPLOS

Author | Ali cloud dragon team

Zebian | Elle

Exhibition | CSDN (ID: CSDNnews)

Foreword

Recently, ASPLOS 2020 announced the computer industry the latest scientific and technological achievements, including paper entitled "High-density Multi-tenant Bare-metal Cloud" Ali cloud submitted, the paper explains how Ali cloud since the inquiry server architecture to solve the problems of the Dragon cloud computing industry for many years of virtualization performance loss problem, break the myth of physical performance machine, so that the cloud server breakthrough performance limits.

The inclusion means that the global computer will recognize Ali cloud top self-development technology, innovative technology also means that China in the global computer industry win a place.

ASPLOS is a comprehensive architecture, programming languages ​​and operating systems conference's top three directions of computer systems, from 1982 since it was launched to promote the development of a number of computer systems technology, general paper acceptance rate of around 20%.

Ali cloud The selected paper entitled "High-density Multi-tenant Bare-metal Cloud", led by the Ali cloud dragon researcher Zhang Xiantao technical writing team, detailed interpretation of the technical advantages of Dragon architecture: beyond the traditional physical machine 100% can count, minute-class delivery capabilities, security physical isolation and open up the entire department and other cloud platforms.

Virtualization is the foundation for cloud computing, physical servers into virtual computing unit you want, and then have maximum flexibility, but it can cause performance loss. How to resolve this contradiction? Ali in 2017 launched the "Dragon architecture" to make up for virtualization performance loss, in addition to cloud elasticity and operation and maintenance advantages.

2019 Yunqi Hangzhou conference, Ali cloud released the third generation Dragon architecture, ECS fully supports virtual machine, bare metal, container and other native cloud, increased by 5 times the performance in IOPS, PPS, etc., can help reduce 50% Computing costs. Last year, 11 dual-core system on a 100% cloud, dragon shine, success Kang Zhu 544,000 pen / sec order creation peak, compared to the same physical machine configuration, not only the business system performance by 20%, and resistance to high pressure load perform better overall business performance is smooth and linear.

Not only that, Dragon is the best partner by far the most popular container technology. Ali cloud service container contrast physical machine architecture-based Dragon 10% -30% performance advantage.

At present, the Dragon has been widely applied in the framework Taobao, Lynx, a rookie, and other services to address performance bottlenecks peak value.

The selected ACM ASPLOS paper entitled "High-density Multi-tenant Bare-metal Cloud", by the total person Ali cloud researcher and innovative product line is responsible for Zhang Xiantao, Ali cloud senior technical expert Zheng, Ali cloud senior technical experts and Yang Hang other Dragon team co-authored.

In this paper, for the first time a comprehensive analysis of the popular bare metal cloud computing services, infrastructure Dragon connotation. As a new generation of virtualization technology will be the development direction of the dragon, and the existing infrastructure for comparison, elaborated in the calculation of both hardware and software, the core performance, virtualization overhead of similarities and differences. Thesis show above performance data for a variety of business fully reveal the unique advantages of the Dragon bare metal architecture. The following detailed interpretation for the paper.

Summary

Virtualization is the cornerstone of cloud computing, multi-tenant (Virtual Machines) to share a single physical server, improving the utilization of data center servers, making the cloud computing service providers can provide a more cost-effective service. However, the technology will introduce a lot of security issues and more VM shared physical servers virtualized, especially in the last side channel attacks. In addition, CPU, memory and I / O will have a non-negligible overhead in virtualization performance. To this end, the physical server leasing has become an emerging type of service in the public cloud, the physical server lease provides users with a powerful isolation, as well as direct access to more comprehensive and predictable performance of the hardware. But the way physical server leasing also has its disadvantages: it can only be for a single tenant, do not have the scalability, cost and high adaptability weak. Current physical server can lease an entire physical server leased to a single user, and the user can not easily replace the basic service image, stored in the cloud computing rental server.

In this paper, we propose an innovative high-density multi-tenant shared elastic bare metal server design, which is elastic Ali cloud dragon bare metal architecture (paper to meet the requirements of the review called BM-Hive). Ali cloud density Dragon bare metal architecture, each instance running on the bare metal computing a single daughter board design, the calculation of the daughter board with proprietary CPU and memory modules. BM-Hive calculated for each sub-panel is provided with a mixed hardware / software virtio I / O system that allows customers to access instances aliyun network and storage services directly. BM-Hive may host up to 16 bare metal in a single physical server instances, significantly increased the density of instances of the server bare metal. In addition, BM-Hive strict isolation at the hardware level bare metal each instance, to increase security and isolation. Dragon elastic bare metal density program has been deployed in the public cloud Ali cloud. It currently provides services to one million simultaneous users.

Introduction

Appearance physical server leasing is to meet the very stringent requirements on performance or safety of customers. But for single-tenant, low-density physical servers, but there is a high cost. Public cloud above most customers are small and medium size customers. We count the number of vCPU VM cloud services each of the above specifications, VM demand for less than 32Core accounted for more than 95%. The CPU specifications for existing physical servers have a minimum 64Core, up to 128Core. These small and medium customers have no choice, or give up the physical machine level of performance and security, using the traditional virtualized VM, or lease the entire server, and give value for money. It also does not have the elasticity of the public cloud bare metal has not yet become a major reason for the mainstream.

To this end, we have designed the Dragon high-density flexible bare-metal architecture: a scalable, resilient bare metal hardware support for multi-tenant virtualization solutions. The bare metal frame (BM-Hive) can guarantee performance CPU and memory have a local physical machine is running, but also virtual IO hardware devices, along with most of the important functions of cloud charging of minutes, the elastic expansion and the like. BM-Hive consists of three modules: calculating sub board, IO-Bond, BM-hypervisor. Calculating sub board contains the CPU and memory Alternatively bare metal instances; BM-hypervisor runs on our underlying physical server can host a maximum 16 calculates the sub-board; IO-Bond is connected to the calculating sub plate and the BM-Hypervisor ties. We introduce these three parts in more detail in subsequent sections.

BM-Hive scheme using significant advantages:

• Affordable: The bare metal up to 16 instances share a physical server, customers can significantly reduce costs;

• Excellent single-thread performance: Examples of bare metal free high-frequency CPU, such i7 4.2GHz;

• compatible with the current operation and maintenance system: Customers can use as examples, like other non-operating bare metal bare metal instances, including mirrored, replace the system tray, add / delete disk cloud cloud computing unique and convenient operation.

Compare the current instance number of different public cloud merits as follows:

Dragon bare metal architecture is an integrated hardware and software virtualization architecture

Dragon software and hardware integration bare metal virtualization architecture is a natural evolution and upgrading of existing virtualization. The figure below, the overall architecture above, BM-Hive with traditional virtualization program on CPU / memory, the access operation and maintenance system, and other aspects of the multi-tenant multiplexing very similar. We developed a dragon bare metal bare metal hardware for sharing scheme to handle one BM-Hypervisor IO subsystem calculates the daughter board.

Current virtualization mainly faces the following problems:

Before discussed in detail dragon bare metal architecture, let's look at some of the problems currently faced by the cloud of virtual technology. The Dragon bare metal virtualization software and hardware integration solutions a good solution to these problems.

• virtualization overhead can not meet the high performance requirements

• Virtualization exist beyond the control of jitter performance, which can not meet the extreme performance requirements of the scene

• virtualized security isolation reach specific industry requirements

• nested virtualization performance can not meet customer needs

Virtualization overhead:

Current virtualization of CPU determines the basic principles must be switched back (VM-Exit) in vCPU environment and physical environment CPU down. Frequent switching of VM can cause serious performance problems. For example, a typical pass-through device interrupts in a virtualized process flow is very long. KVM hypervisor under a virtual switch requires thousands of clock cycles, the cost is likely to reach ~ 10us. Under normal circumstances VMExit (such as interrupts) reached about 5K, VM performance will begin to be affected. There VMExit all causes, such as IPI, EPT violation, MMIO access, and so on.

We sampled 300,000 virtualized instances of operating data, we found examples of more than 10,000 times per second VM-Exit reached 3.82%, and even many instances more than 100,000 switches per second.

Dragon bare metal BM-Hive calculation result of the sub-plate is directly running instance, avoiding any conventional CPU / memory virtualization overhead.

Virtualization jitter performance:

As the customer instance and system services share the same CPU, which led to the host system when the service is busy may affect the operation of the client instance. We sampled cases in 20,000 instances running CPU is preempted, found on a shared examples, there are more than 200 instances, in operation occupied CPU utilization than 2%. That is the actual performance of this CPU 200 is an example of 100% instead of 98%, in the case where the same type of exclusive examples have occurred. After all, interruption processing is necessarily required in the host system CPU to process. While the BM-Hive instance, the system services run in BM-hypervisor, the CPU calculates the sub-plate is different physical CPU, so bare metal dragon instance does not exist any problem preemption computing resources.

Virtualization Security:

This is not a new problem, we all recognize the security level from low to high form: Process -> Container -> Virtualization -> physical machine. Side channel attacks occur so this description, the examples are not virtualized unbreakable. Dragon the above calculating sub-plate bare metal instances runs in a separate, physically isolated from natural, these security issues not present

Nested virtualization performance problems:

Generally KVM nested virtualization performance loss in more than 20%, particularly when some of the more frequently the scene of IO operations. So in the example it is difficult to meet the requirements of the current secondary virtualized cloud computing. The Dragon bare metal instances it can run the client virtualization solution running their own favorite variety of hardware accelerated again in the internal instance.

Dragon bare metal architecture system design

In order to solve the many problems of traditional virtual, BM-Hive design considerations objectives are:

• Multi-tenant

• Physical security isolation unit

• Access existing operation and maintenance system

• physical machine performance

• low cost

Figure 3 shows the overall system architecture of the BM-Hive. We Shenlong bare metal instance is called: BM-guest. Based on the traditional virtualized instance is called: VM-guest. Each bare server is composed of a base plate and a plurality of sub calculation. On the base is essentially a simplified Xeon-based server. Each board has a PCIe calculation sub-expansion board connector in the end base. The main components including CPU, memory, PCIe bus and IO-Bond. IO-Bond is a hardware interface implemented in an FPGA. It is connected to a base plate and a calculating sub PCIe bus, the transparent bridge function similar to the function of the PCIe. IO-Bond calculated on the daughter board PCIe bus, which simulates a plurality of devices by standard virtio virtio kernel driver support. IO-Bond act as bm-guest / virtio front-end and back-end bridge the BM-Hypervisor. Currently, IO-Bond support virtio network and storage devices (block device). It can be easily extended to support other virtio devices.

Analysis of experimental data

I will be the performance of the traditional BM-Hive of CPU / memory system virtual overtaken by contrast, but also for analysis of performance data on the network, storage and other IO subsystem. Finally, we will provide examples of bare metal dragon performance in real-world business scenarios.

CPU and memory profiling bare metal instance Dragon

Local CPU and memory performance bare metal instances critical to the user. 7 and FIG. 8 presents the same configuration, bare metal instance virtualized instances running on CPU performance difference SPEC CINT 2006 measured by the gap test tool STREAM memory performance.

FIG made on data normalization processing, the CPU performance can be seen with the physical machine bare metal instances almost no difference, even some further items than the physical machine and the virtual CPU performance of Examples 0-4% of the general loss. Also similar memory, memory bandwidth virtualized instances of bare metal was about 98% of instances.

Performance Analysis IO subsystem

And BM-guest VM-guest virtio are based on the I / O path aliyun VPC access network cloud storage system, the difference between them is that the BM-Hive virtio with hardware - software hybrid design. In this section, we'll BM-Guest and VM-guest network and storage subsystems to do performance comparison. The maximum speed of the network access contract Both of our example is limited to the product definition 4M PPS, the maximum bandwidth of 10Gbit / s, and the storage limit is 25 IOPS 300MBps. Therefore, the purpose of our test is whether the two kinds of examples to achieve design goals.

PPS: We see BM-Guest / VM-guest can reach 4M UDP PPS design goal, but VM-Guest smoother, this may be due to the path BM-guest through hardware and software interface looks better than VM-guest multi-lead.

Latency: in terms of latency, get VM-guest network delays to be slightly better than the same type of BM-Guest in three testing tools, not much difference.

Storage IO: IO memory performance above BM-Guest will have overall better VM-guest. The IO write random access latency superior aspect BM-guest VM-guest, and the data in terms of the long tail BM-guest data better than 50% VM-guest.

Typical customer applications

By comparing the real application scenarios, we can better discover the advantages of bare metal instance. For example, we compare two examples are differences in network server nginx, database mariaDB, and the like Redis data memory database.

Nginx: the type of service most customers choose the site. BM-guest requests per second aspect of the processing time for each request to be more than 50% VM-guest

MariaDB: MariaDB is a standard test data types, integrated into sysbench. Read performance exceeds BM-guest VM-guest15% or more, while the write performance is exceeded for more than 50%

Redis: a read-memory data structure database. Redis is widely used to improve server performance and service capabilities. BM-Guest in Redis test performance is superior to all VM-guest. 15 and 16 in FIG. Here no longer tired.

Some Thoughts

IO-Bond performance optimization: IO-Bond performance is above the critical path of the IO system. Currently implemented by the FPGA. We can by way of ASIC chip implemented in the future to further improve network and storage performance

Thermal Transfer and Thermal Upgrade: Hot Upgrade bare metal in the above examples can be achieved. While we upgrade the base BM-Hypervisor can be done for instance no customer perception. We have introduced examples of the thermal upgrading of technical detail in 2019 ASPLOS paper " Fast and Scalable Live Upgrade in the VMM Large Cloud Infrastructure " . And heat transfer theory, bare metal instances can also be achieved, we made some attempts. Currently under development.

SGX support: SGX support without any problems in the example above bare metal. In contrast due to the removal of obstacles virtualization, SGX support easier.

to sum up

We introduced the Dragon high-density bare metal cloud services BM-Hive design, implementation and test data. Practice has proved that the next generation of virtualization as the industry direction of development, Dragon's hardware and software integration solutions in compatible with existing virtualization advantages of the premise, to enhance the performance and security. Here, sincerely appreciate the efforts of Ali cloud innovation team to pay all the technical staff.

【End】

Recommended Reading 

Baidu illegal mining engineer profit 100,000, sentenced to three years; Apple antitrust case fined 1.1 billion euros; GitHub proposed acquisition of NPM | Geeks headlines

took the $ 220,000 annual salary, fared not as good as an intern?

How to create a new virus outbreak crown tracker with Jupyter Notebook?

born in a small town, the entrance flow, Fudan coaching, career in Silicon Valley, why the 59-year-old Lu Qi, so "lucky"?

How safe integration when DevOps transformation? What is the impact on the firm's output? 2019 Status Report DevOps latest interpretation of | the Force program

Bitcoin most mainstream, Ethernet fell Square, block chain technology "one size fits all" bonus has ended | block chain developers Annual Report

You look at every point, I seriously as a favorite

Released 1844 original articles · won praise 40000 + · Views 16,650,000 +

Guess you like

Origin blog.csdn.net/csdnnews/article/details/104935649
Recommended