One cloud, multiple cores, the next engineering challenge for intelligent transformation

Entering 2023, industrial digitalization and intelligent transformation and upgrading have entered the stage of large-scale engineering implementation. According to the "China Digital Economy Development Research Report (2023)" by the China Academy of Information and Communications Technology, the digital economy has accounted for 41.5% of my country's GDP, which is equivalent to the proportion of the secondary industry in the national economy. With the improvement of industrial digitization and intelligence, computing power services have increasingly become the foundation of the digital economy. The China Academy of Information and Communications Technology estimates that every 1 yuan invested in computing power will drive 3 to 4 yuan in GDP economic growth.

The so-called computing power service is a computing power industry based on diverse computing power, connected to the computing power network, and aiming to provide effective computing power. The China Academy of Information and Communications Technology pointed out that the current supply form of computing power services is mainly cloud services. At the same time, the task-based supply form of various computing power such as super computing, intelligent computing and social idle computing power is also in the process of active exploration and experimentation. As an operating system for the digital world, cloud computing is coordinating supercomputing, intelligent computing and general computing and becoming the main interface for computing power output.

Chips are the basis of computing power. In order to solve the current multi-core situation in the computing power construction process, the "one cloud, multiple cores" concept has gradually gained widespread attention. As one of the leading private cloud vendors in China, Zhang Dong, chief scientist of Inspur Cloud, emphasized at the 2023 China Computing Power Conference that "one cloud, multiple cores" will become one of the core capabilities of the cloud computing platform. It is not only the integration of cores and clouds, It is also the synergy of platform + ecology. "One cloud, multiple cores" will effectively solve the engineering challenges in the process of intelligent transformation and help government and enterprise users achieve sustainable intelligence with diversified computing power.

Computing power becomes the new intelligent infrastructure

The explosion of large models in 2023 will push computing infrastructure to the forefront of new infrastructure. According to the "AI and Compute" analysis report released by OpenAI, since 2012, the computing power demand of AI training applications has doubled every 3.4 months. Since 2012, AI computing power has increased by more than 300,000 times. According to OpenAI, the total computing power consumption of ChatGPT is approximately 3640PF-days, which is equivalent to three times the total computing power of a current megacity.

According to the "2022-2023 China Artificial Intelligence Computing Power Development Assessment Report" jointly released by IDC and Inspur Information, IDC predicts that the scale of China's intelligent computing power will continue to grow rapidly. It is expected that by 2026, the scale of China's intelligent computing power will reach 1271.4EFLOPS. The annual compound growth rate reached 52.3%, and the compound growth rate of general computing power scale during the same period was 18.5%. In computing power investment, urban intelligent computing power investment has become an important support for promoting the development of regional digital economy. The top five industries with the highest application penetration of China's artificial intelligence industry in 2022 are the Internet, finance, government, telecommunications and manufacturing. The penetration of AI in the industry has increased significantly.

(Zhang Dong, chief scientist of Inspur Yunhai)

Zhang Dong, chief scientist of Inspur Yunhai, emphasized that the future is about intelligent competition, and we must move from informatization to intelligence, otherwise we will completely lag behind the development of the times, and the intelligent computing center is the new infrastructure of the future. The significance of the new infrastructure is not only to provide commercial services, but also to serve as public welfare social infrastructure services for cities. In addition to meeting task-specific intelligent computing needs such as large model training, it can also open computing resources to the society. In this way, intelligent talents and ecology can be cultivated extensively.

At the 2023 China Computing Power Conference, Inspur Information demonstrated its current industry-leading intelligent computing center. The intelligent computing center is a prefabricated and modular intelligent computing center that integrates computing, storage, network and computing power scheduling. It covers different computing nodes and is compatible with mainstream domestic and foreign CPUs and heterogeneous acceleration chips. It can support automatic For diverse applications such as driving, biopharmaceuticals, AIGC, and smart manufacturing, such a set of data centers can be delivered within two weeks. It has already been deployed in Jinan, Nanjing, Suzhou and other regions, providing construction for new regional intelligent infrastructure. New ideas, new paths.

Beyond computing power: Standardized cloud operating system

Just like PCs back then, the key to achieving "a PC on every desk in the world" lies in compatibility with different software and hardware ecosystems in various countries. At present, the main external output interface of different computing power is cloud services, so improving the compatibility of cloud operating systems is the key to realizing inclusiveness, ubiquity and standardization of computing power services. With the development of AI, we are developing from a CPU-centered computing system to a computing system in which GPU, DPU, XPU and other accelerated computing chips coexist. How to make the cloud operating system compatible with a variety of chips and instruction sets and adapt to various The upper-level software becomes the next challenge.

Zhang Dong, Chief Scientist of Inspur Yunhai, emphasized that "one cloud with multiple chips" must solve the multi-cloud management problems brought about by the coexistence of different types of chips, and truly form a cloud. "One cloud with multiple cores" will become a key link in the IT industry chain. It will accommodate and manage various underlying chips and operating systems, and be compatible with various types of virtual machines, containers, databases, and middleware. Cloud-like native applications and software will become one of the core capabilities of future cloud computing platforms.

As we all know, business application software or SaaS services need to face a variety of hardware and software environments such as chips, operating systems, and databases, and be developed and tested in different environments, and verified and iterated in actual business. In today's multi-core era, as countries and manufacturers continue to develop their own chips, the range of optional processors is increasing, and the requirements for cloud operating system adaptation are getting higher and higher. However, each chip manufacturer has its own standards and hopes to promote its own ecology, which leads to uneven performance of each server chip in the cloud data center, difficulty in consistent user experience, and vastly different application effects, thus forming islands of computing power.

Therefore, the ultimate goal of "one cloud, multiple cores" is to support low-cost switching or free switching of user services between processors of different architectures. In other words, it is necessary to achieve complete decoupling of applications and chip architecture to support equivalent switching of applications between processors of different architectures. Of course, this first requires a unified measurement of the computing power of different chips. For example, how many GPUs from one manufacturer can equally replace the computing power of another manufacturer's GPUs requires an industry consensus. Secondly, it also requires hardware, cloud and Collaboration between the upstream and downstream of the industrial chain such as applications enables non-aware switching across architectures at the application level; thirdly, for development tools, it is still not possible to be completely architecture-independent, whether it is an application written in Python or Java, or more or less Most of them are related to the architecture, so it is necessary to promote application development to be independent of the architecture and peel off related calls to the cloud operating system level for processing; fourth, the separation of data and applications is to completely isolate the data layer and achieve architecture independence.

On the whole, "one cloud with multiple cores" seems simple, but it is a huge engineering challenge. Making good use of "one cloud with multiple cores" can minimize the risk of technical route selection, greatly improve business stability and business The flexibility of transformation, but to truly realize "one cloud with multiple cores" requires the entire industry and ecology to have a common belief and determination to implement "one cloud with multiple cores" from multiple links such as standards, architecture, evaluation, testing, and development Going forward, it really breaks the computing power islands of different architectures and realizes the interconnection and intercommunication between different architectures, rather than a simple mode of managing resource pools of different chip architectures.

Computing power integration: three steps to "one cloud, multiple cores"

The cloud operating system is compatible with different chips, chip architectures and application software. This is a huge and ecological project. In the history of enterprise IT technology, whether it is VMware's virtualization software or Oracle's database, the hidden core competitiveness of enterprise IT software is actually extensive compatibility. But just like compatibility projects such as VMware and Oracle are led by one manufacturer and took years of time and investment to gradually realize, more importantly, when the market leadership of VMware and Oracle software is recognized, the entire ecosystem will be Actively provide compatibility with software such as VMware and Oracle.

For a cloud operating system with a short development history, it is impossible to truly achieve broad compatibility in a short period of time. Inspur Information is one of the active advocates of "one cloud with multiple cores". As a third-party manufacturer independent of chips, clouds and ecology, it has proposed "application-oriented, system-centric", "layered decoupling, open standard ", "Iterative innovation, continuous evolution" one cloud multi-core development concept, in particular, a three-stage promotion strategy is pragmatically proposed, so as to achieve the ultimate goal of "one cloud multi-core".

The so-called "three-stage" promotion strategy, that is, in the first stage, unified pool management of heterogeneous nodes is realized, and cross-architecture applications are realized through offline migration and manual switching. This is "hybrid deployment, unified management, and unified view". Solve the problem of "one cloud with multiple cores"; in the second stage, layered decoupling is achieved at the resource layer, platform layer and application layer, and manufacturers work together to achieve smooth switching and elastic scaling of applications. This is "business migration, layering Decoupling, architecture upgrade" to solve the problem of "easy" use of one cloud and multiple cores; in the third stage, realize the coordination between the upstream and downstream of the industrial chain, create standards, common ecology, and create vertical integration solutions. This is "software-defined, Computing power standard, full-stack multi-core" solves the problem of "optimizing" multi-core in one cloud.

Zhang Dong said that we are still in the first stage of "one cloud, multiple cores". Many manufacturers have more or less achieved the first stage of "one cloud, multiple cores" to varying degrees. The next step is to tackle the second stage. stage, that is, hierarchical decoupling is achieved at the resource layer, platform layer and application layer. To this end, Inspur Information recently launched the Fusion Architecture 3.0 prototype, which achieves complete hardware resource decoupling at the server level. Converged Architecture 3.0 has achieved a breakthrough in completely decoupling and pooling core IT resources such as computing resources, storage resources, memory resources, and heterogeneous acceleration resources, and can support multiple general-purpose processor platforms and heterogeneous GPUs, FPGAs, DPUs, etc. Accelerate the collaborative computing of units, and realize collaborative and dynamic scheduling of resources through software definition.

The Fusion Architecture 3.0 prototype breaks the previous "CPU-centered" design concept. It starts from the overall perspective and is system-centered. Through hardware decoupling, it transforms heterogeneous computing, memory, storage and other resources into independently scalable resource pools. Users can freely expand resources according to application needs. For example, the training of large models requires more video memory, but the video memory capacity of the GPU card is limited. Under the design of Fusion Architecture 3.0, all the memory and video memory in the system can be opened, greatly expanding the available memory for large model training. memory, and also reduces the demand for GPU.

Incloud OS is undergoing the second phase of improvements for "one cloud, multiple cores", especially promoting decoupling at the platform layer and application layer. As the core technology of Incloud OS, according to Gartner's report, Incloud Sphere, the cloud server virtualization system, has ranked first among domestic brands in China's market share for two consecutive years and currently ranks among the top four in global market share. InCloud Sphere can realize computing resource pooling for multiple heterogeneous chips such as x86 and ARM. The latest version can simultaneously provide unified management capabilities for processors of four different architectures, further reducing the difficulty for users to maintain infrastructure of different architectures.

In order to create a cloud platform reference benchmark with "one cloud, multiple cores" as the core, Yunhai OS recently completed the industry's first SPEC Cloud benchmark test for the "one cloud, multiple cores" scenario, and mixed three types of processor nodes In the deployment cluster test, indicators such as relative scalability and average instance configuration time all reached the world's leading level, verifying the high efficiency, high performance and high scalability of Yunhai OS in business application cross-processor architecture scenarios. At the same time, Inspur Information actively participated in the "One Cloud Multi-core Technical Capability Standard System" led by the China Academy of Information and Communications Technology, and Yunhai OS passed the One Cloud Multi-core IaaS platform capability assessment with excellent results and obtained the highest level certification of "Advanced Level".

Inspur Information’s persistent pursuit of “one cloud, multiple cores” comes from actual customer needs. Yunhai OS is the first in the industry to support "one cloud, multiple cores" and has rich experience in industrial application implementation. Starting in 2018, Inspur Information has relied on Yunhai OS to help hundreds of customers in the government, finance, energy, transportation and other industries build a "one cloud, multiple cores" industry cloud. For example, Yunhai OS helped a province build the largest scale and most diverse types of chips in China. The largest provincial government cloud platform, covering nearly 2,000 servers with three processor architectures, and fully integrating basic software and hardware, cloud platforms, security systems, operation and maintenance management systems, application systems, etc.

Overall : "One cloud, multiple cores" is the way for computing services and cloud operating systems to cope with the turbulence of the global chip landscape and the uncertainty of the supply chain. It is also the way for cloud operating systems based on open source technology to develop to a certain mature stage. only way. Compared with traditional server virtualization software, the cloud operating system encounters a more complex multi-core environment and needs to face multiple mature and developing chip technology routes at the same time. This puts forward higher requirements for the product maturity of the cloud operating system. The requirements also force cloud operating system manufacturers to carry out original innovation and embark on the road of independent innovation. "One cloud, multiple cores" will also ensure the sustainable development of China's intelligence in the long term and establish core competitiveness in global intelligence competition. (Text/Ningchuan)

Guess you like

Origin blog.csdn.net/achuan2015/article/details/132459539