Conversation with Yan Guihai | To build a high-speed rail network for data circulation, can DPU make computing power soar?

Recently, Yan Guihai, the founder and CEO of Zhongke Yushu, was invited to participate in the Huatai Securities podcast "Taidu VOICE" and held a very hard-core and hard-core discussion with Liu Cheng, Huatai Innovation Investment Director, around one of the three elements of artificial intelligence, "computing power." Brain-burning technology dialogue.

In the program, Mr. Yan discussed the technical principles and application scenarios of the data processing unit (DPU) in a simple and easy-to-understand manner. Regarding the difficulties and industrial significance of improving computing power, he explained in detail how DPU, as a key component of the computing power base, Improve efficiency by connecting various computing nodes and building a computing resource pool. At the same time, he also shared the entrepreneurial journey of scientists and emphasized the importance of the close integration of technological innovation and commercial application in promoting technological development.

The following is a transcript of the conversation:


01

CPU and GPU alone are not enough. DPU forms a "high-speed rail system" that connects dots into a network.

Liu Cheng of Huatai Innovation: Mr. Yan, can you explain in a simple and in-depth way, if computing power is a very important base of ChatGPT, then what role does DPU play in it?

Yan Guihai, Zhongke Yushu: Computing power mainly comes from computing power, and computing power mainly comes from chips, networks and various applications that generate data and need to process data. Therefore, the underlying foundation of computing power is various data centers, which are deployed in server clusters and equipped with different network equipment to connect all infrastructure that can calculate, store and transmit data together to form an organic whole. This is what we call the computing power base.

In the computing power base, we have many different types of processing units (PU), such as the most common central processing unit (CPU), graphics processing unit (GPU), as well as operating systems and various applications we use every day. However, for current large-scale models or complex artificial intelligence algorithms, a large number of computing power nodes need to be connected into a huge computing power pool, and relying solely on CPU and GPU is not enough. So who will connect them? The data processing unit (DPU) will play a very important role in it. It is responsible for connecting all computing nodes to form a computing resource pool.

If a processing unit (PU) is compared to a city, then the DPU is equivalent to the city's high-speed rail system. We can now travel between Beijing and Nanjing on the same day, which was unimaginable before, but now we have an efficient transportation system. By the same token, when applied to computing power infrastructure, we also need to further improve the efficiency between nodes and connect the entire computing power into slices and networks. The DPU will play the same role as today's high-speed rail system.

b0e5104dcbbfe2e9af37fbfe65510d22.jpeg


02

Energy efficiency ratio is an important evaluation dimension of computing power

Liu Cheng of Huatai Innovation: Considering the current cutting-edge of information technology, such as cloud computing, Eastern and Western computing, computing power network, etc., can you explain to everyone the importance of computing power to various industries?

Yan Guihai, Zhongke Shushu: The most direct metaphor is to understand computing power as electricity. Suppose today if we don’t have mobile phones or computers, you will feel that work cannot be carried out. The reason is that your work is based on large amounts of data. Today's computing power is not just about processing data. Many times, even if no explicit instructions are issued for this data, there is a huge system behind it to classify the data and mine the value of the data. The APP in each of our mobile phones will actively push targeted messages to you based on the scene you are in now. The computing power consumed by these businesses that process data in the background may be greater than the computing power consumed by the tasks you actually specify. The processing behind it is the process in which computing power comes into play.

There are actually many evaluation criteria for computing power, one of the important ones is energy efficiency ratio. For scenarios that require data centers to support business operations, using computing power that is more economical and energy-efficient will definitely have more advantages than using computing power with higher energy consumption.

From the perspective of computing power classification, we can refer to scenarios such as weather prediction, earthquake simulation, wind tunnel collision simulation, etc. as supercomputing applications. They rely heavily on massive calculations and have strict efficiency requirements, but do not have high requirements for external networks. too high. In addition, the currently popular Chat GPT model is called an intelligent computing application, which obviously requires a large-scale data center for model training and model inference. In addition, there are also some special computing power requirements in the field of big data, such as a system that supports 1 million people shaking red envelopes. This kind of computing power is different from the previous two computing powers. The amount of calculation required by each user is not the same. It is large, but it needs to handle a large number of users accessing the service at the same time, and its requirements for concurrency are very high.


03

Market demand and implementation scenarios are the background for us to launch industrialization

Huatai Innovation Liu Cheng: I also want to go back to your original intention of starting a business and talk about the industry. Before establishing Zhongke Yushu, you were a scientist. At that time, you discovered some common problems in the industry. Do you want to solve them through an entry point like DPU? Can you talk about the current progress of DPU based on your original intention of starting the business?

Yan Guihai from Zhongke Yushu: When we were working on DPU, we first paid attention to the demand. In the process of studying computing systems, we noticed that more and more businesses are running at rising costs on infrastructure such as traditional data centers. The CPU utilization rate of the data center, after equipped with various cloud infrastructure, is still 20%-30% busy even in the idle state, which proves that at least 20%-30% of the entire system has become In order to support the computing power consumed by these infrastructures, this is the so-called "tax" of the data center.

What's more serious is that this situation is not only a problem of resource consumption, but also directly reduces performance. For example, we found that in cloud computing, the communication delay between different machines is much higher than the communication delay between physical machines. This latency increase is caused by extensive network virtualization. The emergence of DPU is to directly solve this performance problem.

We see particularly rigid demand on the demand side. We all know that securities trading systems and risk control systems have very high requirements for delay, because delay control plays a key role in the liquidity and operating efficiency of the entire trading market. The delay is reduced from millisecond level to microsecond level, which is equivalent to a difference of 3 orders of magnitude. We cannot rely solely on simplification in upper-layer software, but must obtain technical support on hardware links and network protocol stacks. For these requirements, it is difficult for traditional computing systems to directly support them. Therefore, we believe that this problem can be solved by using components such as data processing units (DPUs) close to the network.

Technology maturity is necessary to ensure that our products transition from the innovation stage to mature commodities. When we started developing DPU around 2018, the necessary conditions were basically in place. The only thing missing is market education, because DPU did not exist in the past, and now we need to let the user community understand and recognize the importance of DPU, so as not to have too many doubts about the maturity of this new and innovative product. In order for the market and customers to have confidence in DPU, we need to provide some real cases. Only in this way can our DPU successfully enter the market from the research and development stage.

38905c7fae0a0757058ff9faea1db060.jpeg


04

Let the CPU do the work of the DPU,

It is equivalent to asking the company's R&D personnel to do administrative work

Liu Cheng of Huatai Innovation: Just now you mentioned the topic of data center "tax". Is it possible to quantify it?

Yan Guihai, Zhongke Yushu: Around 2016, Google’s research team made statistics on server utilization on Google Cloud and found that the value of the entire data center tax was about 25% to 30%. This data leads people to believe that this business alone may cause performance overhead to reach 20 to 30%.

We have done similar experiments ourselves. Since network data needs to be processed, the data needs to be captured from the network and placed locally for use by local applications. This process requires the CPU to run an unpacking program, which is the network protocol. Computing power is consumed when running the protocol, and the need for this computing power depends on the speed of the data packets. If the packet speed is high, the CPU may require more processor cores to process it. And if there are fewer data packets, so much computing power may not be needed.

If the 25G data link is fully used, approximately four or five Xeon processors will be needed to process it. For example, for a high-performance desktop computer with 8 cores, if a full-bandwidth network application is to be accessed, about half of the cores may be used for network processing. This is actually a huge overhead.

Liu Cheng of Huatai Innovation: For CPUs, cloud and virtualization are a burden, and they need to offload this burden to the DPU to solve it.

Yan Guihai, Zhongke Yushu: It can be understood this way. We also have a point of view that cloud and virtualization are not the "culprits" in causing data center taxes, we believe it is a cost that must be paid. Just like if you want to collaborate with 100 machines, they will not automatically collaborate. It can be understood that when an organization wants to work efficiently, it must bear a certain amount of overhead, that is, management costs. This management cost is necessary and unavoidable. It’s just who you want to take on those tasks. It looks like an overhead if you let the CPU handle it. But if you separate these functions from the CPU and let them be handled by components better suited to completing these tasks, the overhead will be greatly reduced.

Just like a company, it always needs personnel and administrative departments. If the company's R&D personnel are responsible for recruiting people every day, the efficiency will be very low. But if you find a dedicated human resources department to do this work, it will be more efficient.


05

Through the "combination of soft and hard",

Achieve "low latency" approaching the limit

Liu Cheng of Huatai Innovation: As far as I know, in addition to hardware products, Zhongke Yushu also has software products, such as HADOS software development platform and NDPP ultra-low latency computing development platform. Why does a chip company invest so much energy in software?

Yan Guihai, Zhongke Yushu: There are many types of chips themselves. Different types of chips have different characteristics, especially system-level chips like DPU, which are very dependent on software. Different from terminal devices such as Wi-Fi and Bluetooth chips, DPU, GPU, CPU and other chips are more complex. It is not enough to evaluate the performance of a chip through port testing and signal testing, because the most important thing is how to enable others to use it effectively. In order to ensure the smooth flow of the so-called "last mile" connection, we believe that the underlying software system of the DPU must be carefully developed.

It has always been Zhongke Yushu's philosophy to focus on both software and hardware teams. We not only pursue the optimization of the chip in terms of main screen performance, latency, area and power consumption, but also hope that it can seamlessly connect with existing libraries and middleware. The reason why it can achieve such seamless switching is because we have done a very complete software layer connection at the bottom. Therefore, we need to invest a lot of software research and development resources to achieve this goal.

Liu Cheng of Huatai Innovation: Can you tell us more about which manufacturers you are looking forward to embedding Zhongke Yushu's products into their software or hardware, such as databases, operating systems, clouds, etc.?

Yan Guihai from Zhongke Yushu: This actually involves the issue of product ecology. The major categories you just mentioned can be summarized as terminal software. Terminal software represents some of the major players in our overall application ecosystem, such as operating systems. When we develop a DPU, it must be adapted and compatible with the current operating system, including all types of CPUs and GPUs used by the computing platform under the operating system. The DPU must be compatible one by one to ensure that users on the operating system They can be used without feeling. For DPU, this is the best state.

In addition, there are some basic application systems, such as databases. Traditionally, if you want to improve database performance, you need to have powerful hardware tuning capabilities. In fact, if we look back at the development of databases and operating systems, we can see that they developed relatively independently, which means that database users or the database development community themselves also have strong hardware tuning capabilities. In this regard, we hope that many high-performance networking and other functions of the DPU can be exposed to the tuning interface of these basic system software. For example, in a distributed database, we can place a table on a remote node and call it using the DMA mechanism supported by the DPU to improve performance.

Therefore, this is another case where functionality needs to be exposed to the underlying technology software vendors. Basic software requires lower-level interfaces and gives them room for performance tuning. So we want to bring all of that into one system.

Huatai Innovation Liu Cheng: From the software or hardware level, what software and hardware products does Zhongke Yushu have for different scenarios or different use objects?

Yan Guihai from Zhongke Yushu: The NDPP ultra-low latency computing development platform is a very typical case for us. Our "N" is Nano (nanosecond), and we also hope that the final delay of the product can be close to nanoseconds. As an ultra-low-latency computing development platform, it is mainly targeted at some scenarios that are very sensitive to latency, which means that network-side applications can build core applications based on our ultra-low-latency development platform. On this platform, we provide you with many low-latency physical links, which is equivalent to building a circuit switching system for our customers. When communicating, you no longer need to send telegrams, but can directly make phone calls, which is much faster than before.

ad2c8ea9a343c10f354193c269417beb.jpeg


06

Scientific and technological innovation leads new trends in economic development

Zhongke Yushu Yan Guihai: As an investor in hard technology, what is the main driving force behind you? What is the main investment logic?

Liu Cheng of Huatai Innovation: Hard technology is a track that has not received enough attention, but is very important. The main line of future investment will be increasingly driven by hard technology. China's innovation power has partly entered the deep-water zone, and the shift in investment themes is closely related to the overall background and needs of China's economic development. Twenty years ago, China was mainly engaged in traditional economic models such as manufacturing and processing of supplied materials, and the theme of hard technology investment was not prominent. Because in that model, profits return faster, investment is easier, and output is higher. However, as its economic development moves to a higher level, China is gradually transforming into an innovation-driven, knowledge-driven and technology-driven economy, which is the inevitable result of the development of an economy.

I think it is wrong to simply pursue cold or hot. The same applies to technical direction. Without experiencing the alternation of hot and cold and repeated beatings, it will be difficult to find consensus in technological development, and it will be difficult to develop technologies that can truly solve market problems and meet needs. This alternation of hot and cold may happen countless times, and in the end only those companies that can truly solve problems for customers and have value can emerge. These companies will continue to hone in the alternating process of hot and cold, just like the impurities in the iron are removed during the steelmaking process, so that steel can be made. Multiple alternations of hot and cold can promote the development of the industry.

Guess you like

Origin blog.csdn.net/yusur/article/details/131250139