From the perspective of the development of computing power network, look at the macro computing power system in the next ten years

This article is reprinted from: From the perspective of the development of computing power network, look at the macro computing power system in the next ten years
https://aijishu.com/a/1060000000402097



The three major operators are actively promoting the implementation of related technical concepts of "computing power network", and Internet companies have a similar concept called "distributed cloud".

Personal understanding is that the technical implementation of the two concepts is basically the same. The difference is that: the computing network stands from the perspective of the basic computing environment, focusing on the integration of computing resources; the distributed cloud focuses on how computing is used from the perspective of business services. form provided.

Today's article is a starting point to discuss the underlying computing power system of the computing power network from a macro perspective.



1. The concept of computing network and distributed cloud

The strategic technology trends released by Garnter in 2021 list distributed cloud (Distributed Cloud) as an important strategic technology trend in cloud computing.
Definition of distributed cloud: Distributing public cloud services to different physical locations (i.e. edges) , while the ownership, operation, governance, updates and development of the services remain the responsibility of the original public cloud provider.
Addresses customer needs for cloud computing resources close to the physical location where data and business activities occur.

Distributed cloud integrates public cloud, private cloud and edge cloud. The core idea is to extend the full-stack service capabilities of the public cloud to the places closest to what users need.
Distributed cloud is essentially a cloud, and the cloud is responsible for allocating computing resources.
Although a network is needed in the middle, the network mainly plays the role of a pipeline.

According to the operator's point of view, the computing power network is an upgraded version of cloud-network collaboration and distributed cloud.
It refers to: on the basis of the continuous ubiquitous development of computing capabilities, basic resources such as computing and storage are effectively allocated between cloud-edge-end through network means to improve business service quality and user services. experience.
The network in the computing power network is very critical: the network is the only way for users to access computing power resources, and it is also the entrance for users to initiate business needs and allocate computing power through the network.

From the perspective of user business, the goals of distributed cloud and computing power networks are the same: cloud network edges move from collaboration to integration.
Computing network is a solution proposed by network owners to meet such needs;
distributed cloud is a solution proposed by cloud computing vendors to meet the same needs.

Judging from the trend, the two methods are both cooperative and competitive. With the continuous development of technology and business in the future, the two methods will gradually become unified.


2. Looking at the computing power network from the perspective of computing form


1. Computer resource classification

Insert image description here


In the traditional CPU computer architecture, computer resources are mainly divided into three categories: CPU, memory and peripherals.
In heterogeneous and hyper-heterogeneous computing systems, computer hardware resources can be divided into four categories:

  • CPU: From a control perspective, the CPU, as the central processing unit, is the core of the entire system;
    from a computing perspective, the CPU, like other accelerators, is one of the processors used for calculations.
  • Memory: In heterogeneous or hyper-heterogeneous computing systems, the concept of memory has the same meaning as in classic architectures;
    the difference is that in heterogeneous or hyper-heterogeneous situations, memory has more visitors, more frequent access, bandwidth and other performance More demanding.
  • I/O device: basically the same meaning as in the classic architecture.
  • Other acceleration processors: such as GPU, AI-DSA, network DSA, and various ASIC accelerators.
    From a CPU perspective, other accelerators are "external devices" equivalent to I/O devices;
    and from a computing perspective, other accelerators are computing processors equivalent to the CPU.

2. IaaS service classification

IaaS services are mainly divided into four categories: computing, network, storage and security. The detailed analysis is as follows:

  • Computing: Whether it is a bare metal machine, a virtual machine or a container, the cloud computing host or container hardware platform is composed of four major resource components of the computer:
  • Computing CPU processor, whether it is general-purpose (CPU) computing or heterogeneous computing, the CPU is an indispensable resource component.
  • Acceleration processors for computing. Heterogeneous computing requires acceleration processor resource components such as GPU and AI acceleration.
  • Computing memory, memory is a storage resource used for temporary storage of calculations.
  • Network and storage I/O are indispensable components for computing; in the IaaS system, network and storage usually exist as independent services.
  • According to the needs of business scenarios, the computing hardware platform is a combination of different specifications and proportions of these resources.
  • Depending on the needs, there are many ways to pool all resources and implement local or/and remote expansion of the computing resources of the hardware platform.
  • Network category: In a narrow sense, a network is just a network card that provides network access channels for computing.
    Broadly defined network services include two categories: network forwarding, such as VPC, EIP, various gateways, LB, etc.;
    network communication: such as high-performance network, deterministic network, etc.
  • Storage class: From a computing perspective, external storage is the input and output of computing. Even if the computer is shut down, the data in the external storage still exists.
    But from the perspective of the cloud server, local external storage is temporary storage. When the cloud server resources are destroyed, the locally stored data will also be destroyed.
    If you want to save data persistently for a long time, you need to use remote distributed storage.
    Local temporary storage and distributed fast storage, object storage, archive storage, etc. are all in the form of services to support computing services.
  • Security category: secure computing, such as trusted computing; secure network, such as firewalls; secure storage, such as data encryption and decryption, etc.
    Security is a very huge topic that exists everywhere, so we won’t expand on it here.

3. Two types of computing power networks

Let’s briefly introduce the concept of Serverless.
The definition of Serverless given by Redhat is: "Serverless is a cloud-native development model that allows developers to focus on building and running applications without the need to manage servers. There are
still servers in the serverless solution, but they have been abstracted from application development. out.
The cloud provider takes care of the routine work of provisioning, maintaining, and scaling server infrastructure.
Developers can simply package their code into containers for deployment.
Once deployed, serverless applications respond to demand and automatically Expansion.

Serverless offerings from public cloud providers are typically metered on demand through an event-driven execution model.
Therefore, when serverless functions are idle, no charges are incurred. "

To put it simply, a server-based service requires the user to create a specific instance of the service. An instance can only belong to one user, and a user can have one or more instances;
while serverless services do not require the creation of a service. Instance, you can use the service directly. Many users share and use the same service "instance" (not all users, the service software deployed in different data centers can be different services).
As for the various underlying resources required by the service, users do not need to care. The service can automatically expand and contract based on business usage.

Therefore, the implementation form of computing power network can be roughly divided into two types: server type and serverless type.


Type 1, there is server type

It has the form of a server, which is closer to the concept of a computing power network.
Through the network and other methods, various resources in the data center and across data centers are pooled, and then through cloud bare metal machines, cloud virtual machines, cloud containers, etc., a hardware computing platform is combined for user business operation.


Insert image description here


According to the needs of users, computing platforms with different specifications and shapes can be combined at any location in the cloud, network, edge, or terminal to provide users with optimal computing power services and realize ubiquitous computing power.


Type 2, Serverless serverless type

Insert image description here


Business software, the classic C/S or B/S architecture, where everything is a (micro)service, can be simply understood as distributed software composed of a client and multiple microservices.

Serverless is closer to the concept of distributed cloud.
An early classic case similar to distributed cloud is CDN. When a user visits a website that has joined the CDN service, the domain name resolution request will eventually be handed over to the global load balancing DNS for processing.
Global load balancing DNS provides users with the node address closest to the user at that time through a set of predefined policies, so that users can receive fast services.
CDN is just some static content, while distributed cloud requires distributed services to be placed at nodes such as edges.


Insert image description here


Under the distributed cloud system, users do not need to care about the underlying hosts and containers, but only need to pay attention to their own business logic.
Normally, the client can run locally on the terminal (it does not rule out that some systems only run on the server, and the client also runs on the server side), and users do not need to care about the specific running location.

Cloud service providers can choose the optimal operating environment based on the bandwidth, latency, performance, cost and other requirements required by microservices. It can be local to the terminal, or to the edge, network or cloud.

Moreover, these services can dynamically adjust their running locations according to changes in the environment.


3. Characteristics of macro-computing systems for the next ten years


1. Unknown needs

First of all, system scenarios have been changing rapidly: upper-layer software scenarios are emerging one after another, with a new hot spot emerging every two years, and existing hot spots are still evolving rapidly. Moreover, in macro systems, computing resources are prepared in advance.
When purchasing and deploying related resources, it is not known to which user the specific computing resources will be allocated, nor what tasks the user will run on this resource. In addition, resource allocation and task execution are constantly changing dynamically.

Traditional chip and system design requires first understanding the scenario, and then designing chips and systems based on the scenario requirements.
The challenge in the future is that the scenario requirements of the system are uncertain; not only the chip company does not understand it, but the customers themselves do not "understand" it either.

Therefore, the design of complex computing systems needs to be "aimed at the target".


2. Comprehensive and comprehensive

Whether it is a cloud computing data center system, a cloud network edge-end Internet of Everything system, or a cloud universe virtual and real integration system, there is only "one" macro-computing system.
However, the needs of thousands of different users are diverse; and the needs of users are always changing rapidly; in addition, there will continue to be new users and new needs.
Therefore, the system needs to have all-inclusive capabilities, that is, the system must be able to support a variety of known and unknown requirements.


3. Professional and efficient

Normally, "professionals do professional things." The implication is that experts can only do things in their own field and can hardly do things in other fields.
Meanwhile, generalists can do a little bit of everything, but are less than efficient in every area.

But for macro-complex computing systems, the system must not only be able to do almost everything, but also be professional and efficient enough to do everything, achieving both generality and expertise.


4. Super concurrency

There are hundreds of millions of users, trillions of user tasks, and there is only "one" system.

The computing needs of tens of millions of users need to be responded to in a timely manner, and users' work tasks need to be processed quickly.

Therefore, at the same time, the system concurrently processes hundreds of millions of user tasks of various types.


5. Everywhere

The system covers a very wide area and enables computing power to be ubiquitous, making computing power resources readily available.
That is to say, computing power and related resource support can be provided for any work task of users anywhere and at any time.
Moreover, it is necessary to provide users with a better experience in the most appropriate form and in the most appropriate way, and create greater value for users.


6. Rapid evolution

Upper-layer software applications are emerging one after another, and system requirements are changing rapidly.
Moreover, in the same field, different users have different needs;
at the same time, the business needs of the same user will still iterate rapidly.

From a macro perspective, users and the tasks they need to run are constantly changing.

Complex and integrated systems require continuous and rapid evolution to adapt to the changing needs of upper-level business needs.


4. View computing power network from an architectural perspective


1. Diversity of computing resources

With the performance bottleneck of the CPU, we need to continuously improve performance and computing power through various forms of accelerated processors such as GPU, FPGA, and DSA.
Therefore, computing resources are not just CPUs, but a combination of multiple architectures and types of processors:

  • CPU: Including CPUs of various architectures such as x86, ARM, and RISC-v, and each CPU also has various accelerated co-processors such as Vector, Matrix, and Tensor.
  • GPU: As a general parallel computing platform, GPU is the most widely used accelerated computing processor.
    Moreover, in addition to supporting CUDA for general computing, current GPUs also integrate Tensor Core for more efficient acceleration processing, further improving the acceleration capabilities of the GPU.
  • FPGA: Through various hardware programming designs, computing engines of various forms and architectures can be realized.
  • DSA: There are many fields of computing, and in each field there are many DSAs from many companies. Even the DSA architectures of the same company in the same domain but in different generations may be different.
  • ASIC: ASIC is completely oriented to specific scenarios. Different scenarios in different fields have various ASIC engines with different forms and architectures.

So many processor types and so many processor architectures have created the diversity of computing resources in the computing network.

Insert image description here


Performance and flexibility are a contradiction. For a single processor engine, if you want performance, you must lose flexibility, and if you want flexibility, you must lose performance.
However, the macro-computing system that supports the computing power network must be both "comprehensive and comprehensive" and "professional and efficient." what to do?


Through the cooperation of various types of processors such as CPU, GPU, DSA, etc., team operations are realized.

Each processor engine performs its own duties and exerts its own performance/flexibility advantages, thereby achieving a balance of performance and flexibility in a macro sense and efficient and high performance in each processing in a micro sense.


2. Integration of computing resources

The diversity of computing resources, in fact, the fragmentation of computing resources, is not a good phenomenon.


2.1 Pooling of computing resources

It makes no sense if each processor core is an island of computing resources.
The value of the computing power network lies in the small streams that converge into the sea. This is the foundation of the computing power network.
In this way, the macro computing resources of different cloud/edge data centers and different terminal devices are brought together to form a unified large resource pool of computing power.

The network itself plays more of a role of connection and bus. Network devices also have some computing and storage resources, which can be classified into computing or storage resource types.


Insert image description here


Although pooling can connect the same computing resources on different servers and different devices into a resource pool, due to the diversity of computing resources, resources of different types and architectures still cannot be integrated together.
Therefore, there is not one pool of computing resources, but many, many.
For example, x86, ARM, and RISC-v CPU resources cannot be integrated into one pool; GPUs from different manufacturers cannot be integrated into one resource pool; even storage or network I/O devices may not be integrated because of different interfaces. To a resource pool; including various DSA/FPGA/ASIC, it is impossible to integrate.

When there are hundreds of resource pools of different types and architectures, the value of resource pooling has actually been weakened.


2.2 Aggregation of computing resources

The computing power requirements of AI models such as ChatGPT double every two months. With such rapid growth in computing power, the performance of the entire computing cluster can only be improved through scale out.
However, as the size of the cluster expands, the loss of the cluster becomes increasingly unbearable: the east-west network traffic within the cluster will account for more than 90%, and the actual external interaction traffic will be less than 10%.
This phenomenon is also in line with Amdahl's law, which is limited by the influence of the serial part of the system. As more and more parallel computing nodes are used, the method of improving system performance by increasing the number of parallels will gradually encounter bottlenecks.

Therefore, when the Scale out method cannot further improve system performance, the only way to improve performance is through Scale up.
That is to improve the performance of a single computing node.
Therefore, the computing architecture of a single computing node needs to gradually transition from the current heterogeneous computing to a hyper-heterogeneous computing architecture of multiple heterogeneous integrations.


2.3 Software needs to move across hardware

Insert image description here


In traditional scenarios, software is usually attached to hardware, and the two are bound. The platform can be standardized through an abstraction layer such as HAL, and then the operating system and application software can be deployed.

As systems become more and more complex, software entities, such as virtual machines and containers, need to be migrated on different hardware, which gradually separates software and hardware.

Generally speaking, the shielding of the hardware architecture can be achieved through virtualization, and the software does not need to pay too much attention to the hardware architecture and interfaces.
However, with the complete hardwareization of virtualization technology, the hardware architecture and interfaces are completely exposed to the upper-layer virtual machines or containers.
This puts forward more stringent requirements on the hardware architecture and interfaces.


2.4 Open architecture and ecology to allow architecture to converge

CPU, GPU, AI-DSA and other processors only have a single type of architecture. A company only makes private architectures. If the company's products are successful, it can monopolize the entire ecosystem. Successful cases here include Intel’s x86 and NVIDIA’s CUDA.

In the homogeneous and heterogeneous era, this approach may be successful; but in the hyper-heterogeneous era with many processor architectures, this approach is almost unfeasible.
Because no one company can do it and be the best in all computing architectures.
Moreover, the "let a hundred flowers bloom" approach is actually further fragmenting the entire computing ecosystem, which is contrary to the development trend of computing network resource pooling and cloud-network-edge-device integration.

In the hyper-heterogeneous era, the only way to succeed is for everyone to follow certain architectural specifications, thereby forming an open architecture and ecosystem, allowing the computing architecture to gradually converge, so that the advantages of computing resource pooling can be fully utilized, and computing can truly be realized. Power is everywhere.


2023-12-15 (Friday)

Guess you like

Origin blog.csdn.net/lovechris00/article/details/135023597