Build a computing power distribution network under the new Moore's Law

Moore's Law was first proposed by Gordon Moore, one of the founders of Intel, in 1965. He believed that the number of transistors that can be accommodated on an integrated circuit will double approximately every 18 to 24 months. Today, 20 years later, in the face of the ever-changing society and the rapidly advancing digital needs, Moore's Law has also been given a new definition along with the progress of society. LiveVideoStackCon 2022 Beijing Station invited Wangxin Technology CEO Li Hao to share with us how to build a computing power distribution network under the new Moore's Law.

Text/Li Hao

Edit/LiveVideoStack

The topic of my speech today is "Building a Computing Power Distribution Network under the New Moore's Law"

b7a939da206cb7525a96771b8a0323d0.jpeg

1. The singularity of computing power and its impact on audio and video content

38d5c4a184228ad5917f8cbeb6bb36d9.jpeg

Moore's Law in a narrow sense refers to chips, wafers and densities. Now, Moore's Law is everything. Sam Altman tweeted a controversial topic - the intelligence of the universe will double every 18 months. My personal understanding of cosmic agents is AI computing power. The picture on the right is an illustration from a book published by the famous sociologist Ray Kurzweil in 2005. The timing fits surprisingly well. Around 2020 is equivalent to the IQ of a mouse, and now the number of GPT-4 connections is around 1 trillion. Humans have about 170 trillion neuronal connections. One trillion is the level of rodents, the typical representative is the squirrel. According to Moore's Law, GPT-4 can reach human level in 20 years. But if calculated according to the speed of GPT-3.5 to GPT-4, it only takes 2 years. No one knows what kind of social changes will be caused after reaching 170 trillion. We are now at a point in time of great historical change.

e90e4486d887baebfb920b29cb3db0b8.jpeg

Large models have a scaling law whose growth follows a logarithmic curve. Model parameters grow exponentially, and model performance grows linearly. Because of this thinking, we also believe that the model is still far away from reaching human-level behavioral intelligence.

But when it exceeds 10 billion, scholars have discovered a phase transition curve with rapid growth, and emergent ability appears. I think this should be OpenAI's greatest contribution to mankind, and it proves the feasibility of iterating on the road to large models. The larger the model, the faster the performance increases.

7180d24ca2f0ba6e5f9853de652e3792.jpeg

The following briefly introduces the cloud computing industry. When cloud computing first started around 2010, the team discussed the prospect. We believe that cloud computing will be divided into the following three stages:

The first stage is the resource dimension. You need to master the machine, DB, network, and storage by yourself to build your own services.

The second stage is the serverless stage, which only needs to be scheduled with code, which is more developer-oriented. When the network and services are mature, it will enter the third stage, which is user-oriented.

However, the shape of the third stage is more complex and difficult to use, and no results have been discussed. But when the large model appears, everything becomes easy to solve, and natural language is the best way. When the model understands natural language well, the computing power scheduling logic becomes civilian. This will be a huge breakthrough for the entire cloud computing industry.

137d6913a901ba1f5a5d797073608a2e.jpeg

As far as the audio and video industry is concerned, whether it is intelligent dubbing or generating AI presentation videos, now you can completely conceive ideas and copywriting, reproduce and intelligent dubbing, and finally generate some videos. It only takes half a day to complete the whole set. Compared with the need for a whole team to do it for a week, non-professionals can now complete it in half a day, and productivity can be greatly improved.

For traditional audio and video content, it is generated at the terminal and distributed to other viewers after being processed in the cloud. The middle process is simple and clear. After having AGI, the content that needs to be processed as a whole increases. Assuming that many people are watching "Hurricane", then some changes will be made according to my preferences in the process of distributing the TV series to me, so the increase in its computing power is inevitable. More data will be generated on the edge side and in the cloud, which also leads to changes in the source of content production and distribution.

c11b2f6a5cdb33c7bd05acb3a2da82fd.jpeg

The picture on the left is IDC's forecast for 2020, but I think its data is relatively conservative. In 2025, the vast majority of data will be stored in the core and edge, far exceeding 80%. There will definitely be new opportunities in the future. For personal data nodes, the emergence of artificial intelligence will accelerate the digitalization of society, and personal data will become the largest privacy property.

2. Edge cloud becomes a new data source and triggers structural changes

9ff89e8503faf84b6b4999095a25cbbc.jpeg

I think the edge cloud will become a new data source, and it will also trigger a new round of architectural changes. As the demand for data computing increases, computing will be difficult to complete at the terminal, and can only be done at the edge and the cloud. As the demand increases, the speed of the edge terminal is fast and the cost is low, which must be a good supplement to the cloud. The cloud and the edge together build a generalized cloud computing network, which will undertake most of the data generation and data computing in the future. Users only need a small sample to generate personalized data, and some localized tools must also be cloud-based. At present, companies including Adobe have already started to do it.

ba0eb862427fc41542338c47a37a89b8.jpeg

Opportunities and challenges always coexist. From a technical point of view, in the traditional network structure, it can be divided into small and fine areas like express parcels, with little differentiation. But once the computing power is superimposed, the difference will be very large. The parameter volume of GPT-4 can be between 600 billion and 1 trillion, and it needs more than a T of video memory to load, while some small models can be loaded with a few G. Some things cannot be divided atomically, which will lead to complex distribution, and the division needs to be adjusted according to requirements. The report of iResearch Consulting is quoted here to expand the three major elements of the computing power network.

The computing power network has three major elements:

(1) Computing: the core resource of the computing power network.

(2) Perception: The perception of computing power requirements in specific scenarios and the perception of computing power resources.

(3) Connection: Gather distributed, heterogeneous, multi-level, and idle computing power.

The above three elements endow the computing power network with functional attributes and service attributes, so that it can efficiently activate the computing power resources of the whole society and empower industrial applications. From the perspective of the technical architecture of the computing power network, it can be divided into basic resource layer, computing network scheduling layer and computing network operation layer from bottom to top. At the same time, computing network operation and maintenance and computing network security run through the whole process, forming "three horizontal and two vertical" support form. Ultimately, the computing power network will empower industrial applications in the form of products or capabilities.

5eeb5137371e5c02dbb7a25b93983dc8.jpeg

In the future, there will be a large number of content production processes on the edge side. It only needs to superimpose some source data, including digital humans, special effects, etc., can be generated on the edge side. The delivered pictures can only be generated on the edge side because of the strong delay requirement.

b3ea6768de13e8fe155777b35f2b4434.jpeg

Second, there will be more real-time interactions. Many localized interactions will become interactions with the cloud. At present, the most demanding cloud interaction should be RTC scene video dialogue. The next potential scenario is cloud gaming. The delay required for the interaction of cloud games is only half of that of video, and cannot exceed 100ms, followed by virtual reality within 10ms. With the evolution, the requirements for the stability of network distribution computing power will become more and more stringent, and the edge side distribution must have the ability of ultra-low latency.

e6aededcf10b3f74e8550863947bf7d1.jpeg

Finally, business logic will also increase. Once the data source changes, all data logic needs to be rewritten. The biggest difference between this and cloud computing is that the edge cannot be mobilized in terms of resources, which would be too complicated. Therefore, the edge side must take service as the core and rebuild business logic on the basis of Severless.

3. Evolution of Wangxin audio and video service architecture

Wangxin has made many attempts to address the various problems mentioned above. We have also launched products based on future computing power to serve audio and video customers well. Therefore, our value is more focused on how to provide customers with lower latency, better and cheaper computing power, and more convenient operation services.

Let me briefly introduce Wangxin Technology. Wangxin Technology is the earliest cloud computing company in China, and it is also the world's largest and most sinking edge network operator. The concept of Wangxin is that edge cloud computing must be a platform model. Whether it is self-construction, joint construction, or joint construction, it is necessary to efficiently integrate multi-level fragmented resources, and at the same time achieve technical sub-assembly, standardize external interfaces, and serve customers in the industry well. At present, Wangxin's main service targets are leading companies in the audio and video industry, and it is also deployed in scenarios such as AI and ultra-low latency. Now the number of edge nodes of Wangxin has exceeded 5 million, and it has covered more than 600 million Chinese users through its own SDK.

First of all, let’s take a look at the cloud gaming experience we built using idle hosts at the edge. The overall screen experience and delay can fully meet the needs of the game.

b3880f5540ea43cea2ec532ac4b00e6c.jpeg

Compared with traditional manufacturers, the biggest difference between Netcenter's cloud game architecture is that it uses edge-side nodes with higher coverage density. If you use the central cloud to make cloud games, not only the cost is uncontrollable, but the delay is also very high, and it is impossible to achieve an experience lower than 70ms. Through the constructed edge network, find the node closest to the user to optimize the rendering and streaming experience.

Wangxin proposed several technological innovations. The first one is unique to the edge network. Because of the large number of edge nodes, it can clearly perceive the status of the network, which is difficult for many cloud computing vendors. Second, all end-to-end protocols are made by Wangxin itself. The core of the current RTC industry is to solve the problem of dual-terminal interaction, but cloud games are single-terminal ultra-low-latency interactions. High bit rate, high frame rate, and low latency are the three basic characteristics of cloud games. The network core separates the data plane and the control plane by using the QUIC protocol. In response to high bit rate, high frame rate, and ultra-low latency, new technologies such as congestion control and RS FEC are also introduced.

Based on the above-mentioned innovative technologies, we see that in the case of rapid recovery of available bandwidth, the ideal value can be recovered within 500ms in the case of large network packet loss. The test results of weak network performance show that OT QUIC has the best packet loss and better delay than indigo.

Let's take a look at the AIGC text generation picture scene. The edge is mainly based on small computing power. This type of small model is very suitable for running on the edge, and its single-task, less-interactive approach is more in line with the algorithm of the edge.

659c97b05adcff2815e396b23e232660.jpeg

The overall AIGC IAAS structure has not changed much. Netcenter has been able to build a complete edge network virtual GPU container. Some containers are larger and require cutting, while others are smaller and do not. In the final analysis, it is still necessary to use image-generating services to mobilize shared edge nodes, which can greatly save costs.

In order to make edge cloud computing better respond to future scenarios, Netcent Technology has proposed a "three-step" development strategy:

The first step is to reduce costs and increase efficiency. At present, based on the scenarios that can generate economic benefits and the actual needs of customers, we use customers to expand the network scale, improve the network level, reduce costs, improve efficiency, and promote healthy expansion of enterprises.

The second step is functional iteration. Gradually enrich business scenarios, promote the improvement and promotion of cloud games, AIGC and other businesses; insist on research and development and coverage of computing power needs; continue to improve computing power layout and capabilities.

The third step is to build an ecology. Promote the superposition of edge network and computing power, connect industrial scenarios and realize the productization of industrial scenarios; use the Internet of Vehicles and vehicle-road collaboration as the entry point to adapt to the needs of consumer customers; build an open network platform to attract developers and partners to jointly build an ecosystem .

We are confident that we will gradually realize the three-step plan in the next 8 to 10 years. The rapid development of AGI this year is likely to greatly reduce the time required.

This is my sharing today, thank you all.


7987e52933f06f1511f180c54a84bc5f.jpeg

LiveVideoStackCon 2023 Shanghai lecturer recruitment

LiveVideoStackCon is everyone's stage. If you are in charge of a team or company, have years of practice in a certain field or technology, and are keen on technical exchanges, welcome to apply to be a lecturer at LiveVideoStackCon. Please submit your speech content to the email address: [email protected].

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/130164343