[Technology Series] Talking about GPU Virtualization Technology (Chapter 1)

Abstract:  GPU in-depth good article series, shared by Alibaba Cloud technical experts

Chapter 1 Development History of GPU Virtualization

The development of GPU virtualization is actually closely related to the popularization of the public cloud market and cloud computing application scenarios. If you talked about cloud computing 10 years ago, most people's response was "unintelligible." However, with the popularization of cloud computing scenarios, the concept is deeply rooted in the hearts of the people, and gradually everyone has a clearer concept and instantiation of cloud computing. Naturally, with the expansion of application scenarios from the application of a single CPU-dependent computing unit to a variety of architectures, and the application of heterogeneous computing scenarios, virtualization and cloud-based computing chips have also been proposed for GPU, FPGA, TPU and other professional computing chips. strongly demand. Especially in recent years, the rapid development of machine learning, deep learning and other fields has given birth to the climax of the migration of heterogeneous computing scenarios to the cloud.

So what is the market size of this heterogeneous computing application scenario? Heterogeneous computing is the computing carrier of machine learning artificial intelligence. Let's take a look at the prospects of artificial intelligence first. (Citation source: https://bg.qianzhan.com/report/detail/459/180116-3c060b52.html )

50d41c78a6ba70fe274b95a55bcc002034ee2e30

 

Figure 1: 2015-2018 Global Artificial Intelligence Market Scale and Forecast (Unit: 100 million yuan, %)

d173856f32fca8af3de65346e71666ffe1e89b6e

Figure 2: Market size and growth rate of China's artificial intelligence industry from 2014 to 2018 (unit: 100 million yuan, %)

 

Therefore, it is not difficult to understand why major cloud computing manufacturers, no matter how big or small, will try their best to develop heterogeneous computing products and compete for market dominance.

Since GPU is the main force of heterogeneous computing, let us review the development history of GPU virtualization and make a horizontal comparison of various GPU manufacturers. It is not difficult for everyone to see which manufacturers are in the leading position and which are the soy sauce party :)

2008 : Preface

VMware's GPU full virtualization VSGA technology is the first attempt to virtualize GPU sharing. It was first introduced in VMware's commercialized Workstation 6.5 and Fusion 2.0 versions at the end of 2008, and later in the data center-oriented product vSphere. integrated. But this is a VMware proprietary closed-source solution, which has not seen large-scale applications in the open source community and products outside of VMware, and is not the focus of this article.

2012 : Start

With the introduction of the kernel VFIO module and the gradual popularization of pass-through devices, the road to GPU virtualization has been opened. The large-scale application is generally accompanied by the successful landing of the VFIO module. In fact, around 2012, GPU pass-through technology has always been an important application scenario for VFIO modules.

2013 : The first product competes with the pack

Nvidia released GRID K1 products in 2013, which marked the maturity of GPU virtualization and gradually started the rapid development of heterogeneous computing virtualization.

In fact, in the same year in 2013, Intel OTC's GPU virtualization solutions for HSW's GVT-d and GVT-g have also been developed for more than a year. Originally the hardware was based on SNB/HSW, and the prototype code was based on the Xen Hypervisor. (Off-topic: Looking back, you will find that Xen, which was flourishing at the time, was gradually replaced by the rising star KVM a few years later. Today, Xen is rarely seen in the public cloud market, so I feel bad for Citrix for a few seconds).
 

Intel maintains a keen technical insight into the development of the GPU industry. It has already started a proposal for GPU virtualization as early as 2011. However, because it did not attract enough attention, it was not until 2014 three years later that there was a GVT-g-based solution. The XenClient product comes out.

The same year: The community maintainer of the VFIO module also officially released the VGA assignment on the KVM Forum. (See: https://www.linux-kvm.org/images/e/ed/Kvm-forum-2013-VFIO-VGA.pdf)
 

At the beginning of the same year: AMD has also started the GPU virtualization solution (Tonga architecture) based on SRIOV, and began to develop the GIM driver and vGPU scheduling system of SRIOV PF. It is speculated that the hardware implementation of SRIOV should be completed about half a year in advance. It was not until two years later that AMD finally ushered in its first GPU SRIOV product: the FirePro S7150 (released in early 2016).

As the leader of the GPU industry, Nvidia is basically 1-2 years ahead of its rivals in the R&D and productization of GPU virtualization. And as a competitor AMD caught up after that. And Intel was basically a runner at that time.

2014 : vGPU sharded virtualization is born

A year later, in 2014, with the publication of a Usenix ATC paper: "A Full GPU Virtualization Solution with Mediated Pass-Through", a new technology of GPU virtualization that was unknown officially entered everyone's eyes: GPU Fragmentation virtualization (let's call it that in Chinese, because the name of mediated pass-through doesn't make people understand what this is).

The paper was published by the two Principal Engineers of Intel OTC, and it also represents Intel's technical accumulation in the field of GPU virtualization (the productization has not improved, and it is tears to say it).

It should be said that Nvidia, as an industry leader, plays a crucial role in the promotion of sharding virtualization in the community. In fact, the mdev framework of VFIO was introduced by Nvidia for the GRID vGPU product line. The concept of mdev was pioneered by Nvidia and incorporated into the Linux kernel 4.10. People playing closed-source ecosystems have also begun to embrace open-source.
 

AMD 2014 has no news, it should continue to develop the world's first SRIOV-based GPU solution.

2015 : Differentiation

The cooperation between Intel and Citrix has successively released XenClient and XenServer products based on GVT-d and fragmented virtualization GVT-g. These products represented the benchmark in the GPU virtualization industry of the Xen community at that time. Why the Xen community? Because GVT-g had not released a KVM version at that time.

Intel has also begun to push the technology of GVT-g at major domestic and foreign conferences, of course, hoping that its technology can be commercialized and have a good market prospect. For example, at the "Intel Developer Conference" (IDF) that year, the GVT-g-based multimedia video processing cloud solution was first released. There are many people who listen to it, more than 100 people, and many are interested. As a free GPU for audio and video processing, it is more cost-effective than using E5 Server alone. Unfortunately, none of the products landed in the end. The reason is the positioning problem of the internal Intel GPU. The fatal wounds and pain points of the Intel GVT-g solution will be discussed later.

And AMD continues to develop the world's first SRIOV GPU.

When everyone else is playing with technology, Nvidia has already started the industry layout. In the same year, various GRID-based solutions on AWS and VMware were released, such as the very cool Game Streaming.

In fact, GRID is a big concept. Represents a large stack of products for Nvidia's GPU virtualization. Among them, GRID vGPU is a fragmented virtualization scheme based on mdev.

2016 , 2017: Returns

In 2016, AMD brought the world's first GPU SRIOV graphics card FirePro S7150x2. And this product for graphics rendering applications has also become a must-have business for major public cloud manufacturers in the future. Graphics rendering virtualization is the only one with high cost performance.

Intel continues to vigorously promote Intel GVT-g technology in major forums. And for the first time in technology, Nvidia, the industry leader, took the lead in realizing the vGPU hot migration technology. It can be said that the virtualization department of Intel OTC has achieved the ultimate in GVT-g within its own capabilities. However, on the road to productization, it has It's getting harder and harder.

At this time, Nvidia is rushing all the way with the AI ​​​​window, and has increasingly perfected GRID technology and sharded virtualization, leaving its opponents far behind. At this time, Nvidia also began to show its presence in the open source community. And on the second day of the KVM Forum in 2016, Nvidia architect Neo introduced the GRID vGPU technology grandly. It just so happened that I, as the representative of GVT-g technology, gave a presentation on the theme of GPU Live Migration at the same venue.

Let's feel the scene at that time: GRID vGPU audience comparison with GVT-g audience:

cfcf2c050bb9430c1123d8f08d04f83efc092463fed2b17f2e02a316c19b4043d2e555f01e19d295
 

 

 

It has to be said that in the early years, Intel, as the representative of the core graphics GPU, and Nvidia, as the representative of the discrete graphics card, had in-depth cooperation in GPU research and development. And then cooperated with AMD to develop CPU+GPU chips. And the recent Intel-AMD partnership to combat Nvidia's squeeze in the GPU space.
 

The above three are both rivals and friends.

2018 : New frontiers

Nvidia continues to maintain the status of the industry's first to control the vast majority of market share. Who let people have the foresight to make an early layout and harvest early.

AMD also has follow-up releases. For example, the release of MI25, which is aimed at the old rival Nvidia's benchmark Deep Learning, etc.

With the popularity of GPU virtualization applications, the application scenarios of GPU virtualization are no longer limited to the cloud computing market. Various emerging industries have also begun to apply GPU virtualization technology. The most direct is the in-vehicle entertainment system, referred to as IVI (In-vehicle Information system). So the three old friends are old rivals, and they all started to compete in the field of IVI and autonomous driving. And this also brought a turning point for the technology landing of Intel GVT-g. Therefore, Intel took the lead in releasing the virtualization solution based on the Internet of Things (ACRN), and set off again with the sharding virtualization technology of GVT-g.

Original link

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324404818&siteId=291194637