China enters the golden age of scientific research, general supercomputing cloud service fills the computing power gap

 

 

"Independent innovation" is one of the most popular keywords in the 14th Five-Year Plan. In the 14th Five-Year Plan, it was also proposed to maintain the core position of innovation in the overall situation of my country's modernization drive, and to take scientific and technological self-reliance as a strategic support for national development. In particular, at the Fifth Plenary Session of the 19th Central Committee, it was considered and approved to achieve a major breakthrough in key core technologies and to enter the forefront of an innovative country as a national vision for 2035. In strengthening independent innovation, scientific research R&D funding is an important guarantee. my country has achieved the world's second largest R&D funding in 2018. In 2019, national R&D funding increased by 12.5% ​​year-on-year, accounting for 2.23% of GDP.

At present, whether it is from the emphasis on independent innovation and independent scientific research in the 14th Five-Year Plan, or the investment in national R&D funding over the years, my country has entered the "golden age" of scientific research. For scientific research, supercomputers have always played a key role. In the 2020 global supercomputer TOP500 released in June 2020, 45% of the supercomputers are from China, and in the future, China's independent research and development of the Tianhe-3 supercomputer will target the E-level supercomputer. However, although these supercomputers, as national strategic resources, have played an important role in major national scientific research applications, they appear to be overkill in general and commercial computing scenarios for small and medium-scale massive computing.

 

 

"Cutting-edge supercomputing is'high-precision', serving applications with 10,000 cores or more, and it is difficult to enter the home of ordinary people," said Wu Di, general manager of Beijing Super Cloud Computing Center when talking about the changes in supercomputing applications. The mission of Beijing Super Cloud Computing Center is to better serve the general supercomputing market, that is, the needs of computing users below 10,000 cores, including scientific research, education, engineering design, etc., to improve scientific research efficiency, reduce research and development costs, and achieve cutting-edge supercomputing Complementarity will be formed so that large, medium and small users from all walks of life can use supercomputing and promote the transformation and upgrading of China's digital economy."

In the 2020 China High Performance Computer Performance Rankings TOP 100 released in November 2020, the A partition of Beijing Super Cloud Computing Center won the third place in the TOP 100 rankings, and the general-purpose CPU computing power is the first. The center focuses on the general supercomputing market, especially through close cooperation with well-known IT hardware suppliers such as Dell Technologies, shortening the three to five-year construction period of traditional supercomputing to three weeks to meet the needs of a large number of users for small and medium-sized supercomputing Demand to fill the gap in social computing power.

 

Fill the gap in general computing power

At present, the whole society is increasing its investment in independent innovation and research and development funds. According to statistics, among the industrial enterprises above designated size, nine industries including electrical machinery and equipment manufacturing, automobile manufacturing, pharmaceutical manufacturing, chemical raw materials and chemical products manufacturing have invested more than 50 billion yuan in R&D expenditures in 2019; Beijing , Guangdong, Jiangsu, Shandong, Zhejiang, Shanghai and other six provinces and cities have invested more than 100 billion yuan in R&D in 2019.

However, a large part of the investment in R&D research funding by major enterprises and cities has been invested in hardware infrastructure such as servers, which has resulted in inefficient use of R&D research funding. According to Wu Di, general manager of Beijing Super Cloud Computing Center, traditional supercomputer centers are generally led and invested in construction by the government, and due to their particularity, such investment is often regardless of cost and return. It is precisely because of this that often billions of billions of dollars are invested in capital, but they are faced with the dilemma of lack of market applications. Moreover, large supercomputing centers still have a long construction period. Once completed, it often means that the equipment needs to be updated twice. Continue to provide services.

Most users of cutting-edge supercomputing services are industry experts and scholars, while small and medium-sized users in the general supercomputing market, including scientific research, education, and small and medium-sized enterprises, have a large number of computing power requirements, facing demands for cost performance, flexibility in resource use, and service quality. , A supercomputing center with market-oriented and commercial service capabilities is needed to fill the gap in general computing power. This is the core value of the Beijing Super Cloud Computing Center.

The Beijing Super Cloud Computing Center was established in November 2011. It was jointly built by the Chinese Academy of Sciences and the Beijing Municipal Government. It was built by the Chinese Academy of Sciences Computer Network Information Center and operated by Beijing Beilong Super Cloud Computing Co., Ltd. The Beijing Supercomputing Center takes the overall goal of establishing a foothold in Beijing, radiating across the country, building a leading domestic and world-class information infrastructure and public service platform, and is oriented to scientific computing, industrial simulation, meteorology and oceanography, new energy, biomedicine, artificial intelligence, etc. Industry application areas, providing super cloud computing services on demand.

At present, the Beijing Super Cloud Computing Center has a total core number of 270,000 cores and serves more than 30,000 users. It can provide on-demand supply, no queuing, worry-saving and time-saving high-quality according to users' computing capacity, applications and business scenarios. VIP computing service. Wu Di emphasized that the Beijing Super Cloud Computing Center took the lead in launching super-computing cloud services in China, especially during the epidemic, helping many universities and research institutes complete scientific research tasks in time.

The expansion of the Beijing Super Cloud Computing Center is not like the traditional super computing. Instead, it is equipped with a variety of computing resources according to user needs. The core A area cooperates with Dell Technology to adopt the "second generation AMD EPYC (Xiaolong) "The processor meets the computing needs of massive daily scientific research users and enterprise users, and can be expanded as needed. Wu Di emphasized that Beijing Super Cloud Computing Center is equipped with multiple partitions, including the latest models of hardware equipment, so that it can better adapt to the needs of different users and improve scientific research efficiency.

 

The technical layout behind the universal computing power

Beijing Super Cloud Computing Center adheres to user demand-oriented, providing "on-demand supply, dynamic expansion" computing resources and high-quality cloud computing services. As of October 2020, Beijing Super Cloud Computing Center's general-purpose supercomputing power exceeds 10PFlops, ranking third in the 2020 China TOP100 high-performance computer performance, and ranking first in the general-purpose CPU computing power market. Beijing Super Cloud Computing Center released the super cloud computing service platform-China Science and Technology Cloud·Super Computing Cloud in 2018. The super cloud computing construction model was launched in 2019 and 10PFlops of computing power will be achieved in 2020.

 

(Part A of Beijing Super Cloud Computing Center)

To meet the needs of different types of users, the Beijing Super Cloud Computing Center provides different computing resources. Currently, the largest is the A partition, as well as the M and T partitions scheduled by the core, and the IO partition with enhanced storage performance. Districts 17 and 19 are distributed in Beijing, Liaoning, Jiangxi, Ningxia, Hubei and Shenzhen. This is not only for cost considerations, but also for the consideration of nearby users.

Although it is difficult and costly to build data centers in Beijing, Shanghai, Guangzhou and other places, this is also the main market for general-purpose computing power in China. It needs to serve users in these areas nearby; and the deployment of data centers in the central and western regions can optimize the cost structure and optimize the cost structure. It can serve users in local and surrounding areas nearby. For example, the national supercomputing centers in the Beijing, Shanghai, and Guangzhou regions may cost 30 to 40 million yuan or more in a year, and deploying some data centers in the west can reduce their electricity bills by half or more. Beijing Super Cloud Computing Center operates in a market-oriented manner. The ultimate goal is to reduce costs and give back to customers, so that customers can use Pratt & Whitney's computing power, and ultimately apply to product development and scientific research, forming a virtuous circle.

The choice of CPU model also follows the same principle-around customer needs. The construction of cutting-edge supercomputers considers the balance of all aspects, because supercomputers have many requirements for resources, including computing, memory access, communication, and I/O, which are all outstanding. For the majority of small and medium-sized users, the application characteristics are diverse, and user applications will not require strong CPU, network, and memory access at the same time. For example, computational physics applications are computationally intensive, that is, CPU scalability and network are required. In this case, only a moderate CPU frequency and a high-speed internet connection are required; but the structural strength analysis application is memory-intensive, and the finite element analysis requires a machine with large memory and fat nodes, which needs to be set in the corresponding partition Large memory node.

 

(Topological map of Area A of Beijing Super Cloud Computing Center)

According to different user needs, setting up different computing resources, and meeting the needs of specific users, this has evolved into the Beijing Super Cloud Computing Center A zone. The A zone uses the Dell EMC PowerEdge server based on EPYC Rome processors, which can provide one node and two A CPU with 64 cores, which is very suitable for users of first principles of computational physics and pneumatic analysis. Several nodes can be used to form a small-scale or medium-scale computing task. Some user tasks do not require 64 cores, but require massive tasks. For example, Monte Carlo simulation of astronomy only requires single-core calculations, but to complete a batch of tasks requires hundreds or thousands of cores. Such massive single-core calculations require customized clusters. , Which leads to the M partition and T partition.

Guo Yu, CTO of Beijing Super Cloud Computing Center, said that the current Beijing Super Cloud Computing Center A partition has a scale of 3,000 nodes and 6,000 CPUs to achieve a balance of scale, efficiency and cost. Next, we will continue to expand the new partitions. To meet the needs of different users. At present, the construction of M partition, T partition and A2 partition has been completed, and the A3 partition is also under rapid construction. The ultimate goal is to ensure sufficient computing resources so that users do not need to queue and have available computing resources at any time. In order to ensure that users are unaware of back-end computing resources-as long as the job is submitted, there is no need to consider whether the job is completed in North China or East China. The Beijing Super Cloud Computing Center provides a complete set of automatic job migration and automatic resource matching. And other automatic operation and maintenance systems, so as to achieve an unaware user experience in resource switching.

The computing power of the Beijing Super Cloud Computing Center adopts a super computing cluster architecture, which can provide computing resources of more than 5,000 physical servers, with a total of more than 270,000 CPU cores; covering PB-level large-capacity parallel file systems, full wire-speed, non-blocking dedicated computing The network environment improves the computing speed and scalability; at the same time, it is equipped with complete, efficient, and professional basic software, including operating systems, parallel compilation and development environments, etc., and supports multiple compilation environments and applications for CPU and accelerator cards, including compilers, Debugger, MPI parallel development environment and math library, etc. Beijing Super Cloud Computing Center covers various levels of task queue management and scheduling functions, and sets different priority levels according to user needs and application scenarios to ensure the normal operation of key services.

 

TOP 3 is just the beginning

Part A of Beijing Super Cloud Computing Center successfully won the third place in 2020 China's high-performance computer performance TOP100 and the first in general CPU computing power. This is just a new beginning for Beijing Super Cloud Computing Center and China's general supercomputing industry. So far, China's supercomputing has gradually formed three types of markets: cutting-edge supercomputing with more than ten thousand cores, general-purpose supercomputing with around one thousand cores, and industry supercomputing from single core to thousand cores. Among them, general supercomputing is represented by the Beijing Super Cloud Computing Center, and industry supercomputing is represented by public cloud supercomputing services.

 

 

Wu Di emphasized that for the Beijing Super Cloud Computing Center, its original intention was not to obtain rankings. The ultimate goal of being a market-oriented supercomputer center is to improve industry efficiency and reduce industry costs. Enterprises and science and technology institutions are gradually shifting from self-built to purchasing super cloud computing services. On the one hand, they can save costs, on the other hand, they can obtain more computing power, and at the same time, there is no need to maintain the system, thus devoting more time and energy to scientific research. . The Beijing Super Cloud Computing Center not only has strong computing power, abundant software resources, reliable support team, customized industry solutions, but also provides on-demand supercomputing resources, reduces the queue of computing tasks, and adapts to multidisciplinary applications. Demand, reduce the cost of user resources, and provide professional and complete solutions for large-scale complex technologies and commercial applications. At the same time, it also provides one-to-one exclusive WeChat group, 7×24 hours online service and 5-minute rapid response mechanism.

For a 100-node medium-sized supercomputing data center, the usual construction period is three to five years from project application, approval to completion of project construction. The Beijing Super Cloud Computing Center shortened the previous construction period to one quarter and further shortened to three weeks. Traditional cutting-edge supercomputing uses computers developed for specialized applications, while general-purpose supercomputing and industry supercomputing are built using standardized products on the market to achieve a flexible, fast, and agile cloud computing service model. Unlike the supercomputing services provided by public cloud service providers, general supercomputing has higher requirements for hardware.

Guo Yu introduced that the peak and valley workload of general public cloud vendors fluctuate greatly, and the average machine load is about 30% or 40%; while the general supercomputer machine must run 7×24 hours, and the CPU load must reach 100%. This creates high requirements for the stability and reliability of supercomputer servers, which must be able to run for a long time for 24 hours. Once the machine fails or there is a problem with the CPU or memory, the loss of scientific research results is immeasurable. The Beijing Super Cloud Computing Center conducted a series of tests, and finally chose to cooperate with Dell Technology Group because the performance of Dell Ianson’s equipment is very balanced in all aspects, including stability and integration with AMD chip advantages. More importantly, Dell The science and technology group's supply chain system is mature and can solve the center's demand for spare parts at any time.

 

​(Dell PowerEdge C6525)

Beijing Super Cloud Computing Center hopes to obtain a faster construction cycle, and Dell Technology Group's good supply chain system can reduce the operating costs of Beijing Super Cloud Computing Center. In fact, the Beijing Super Cloud Computing Center has maintained a close cooperative relationship with Dell Technology Group for a long time. Except for Part A, other partitions are also completed in cooperation with Dell Technology Group. In particular, Dell Technology can provide the latest and best prototype products immediately when it releases a new generation of chips, such as AMD, through its good supply chain relationship with the upstream and downstream of the server industry chain. This allows Beijing Super Cloud Computing Center to be able to Use the latest technology and products. In addition to servers, the Beijing Super Cloud Computing Center also widely uses Dell Technology Group's network, storage and other products. Guo Yu also said that Dell Technologies is thoughtful about the maintenance of the data center and provides special management tools such as iDRAC, which are very helpful to reduce maintenance costs and improve efficiency.

Ling Weicai, director of Dell Technology Group's High Performance Computing Laboratory and high performance computing solution architect, introduced that Dell Technologies' iDRAC software products can remotely access, manage, and boot the server and other underlying operations. The specific implementation method is that each server has an iDRAC The IP address of another software product, OME (OpenManage Enterprise), has built a web control interface. The administrator can directly access each service on the interface instead of logging in to each server one by one. Moreover, these two All software products have free versions available to customers.

By cooperating with suppliers such as Dell Technologies that have a mature hardware supply chain system, as well as complete technology products and technologies, the Beijing Super Cloud Computing Center is able to benefit more small and medium-sized enterprises and scientific research institutions with the computing power of general supercomputing.

As a promoter of the vigorous development of China’s wind power industry, Goldwind’s demand for supercomputing is reflected in three aspects: First, it is necessary to solve the design of wind turbine blades including twist and angle; second, it is the location of the wind turbine. Deduction is made based on wind energy resources and local meteorological observations throughout the year; third, the forecast of power generation is reported to the national grid after the wind power plant is put into operation; these all require simulation calculations. The Beijing Super Cloud Computing Center provided a computing platform that satisfies Goldwind's needs after in-depth understanding of Goldwind's needs. Goldwind moved to Zone A, developed a business system based on the platform, and even opened the system to the industry chain Other wind power partners or vendors in the ecosystem.

In summary: At present, China is shifting from processing and exporting to high value-added industries, which requires a lot of computing power. General-purpose supercomputing can be described as "time"; coupled with the fire of "new infrastructure", it has promoted the The demand for computing can be described as the "location" of general supercomputing; and the penetration of cloud-based services into supercomputing has just begun. The Beijing Super Cloud Computing Center has gathered a group of supercomputing industry professionals who have been in the business for more than ten years. This has laid a solid foundation of "human harmony" for general supercomputing represented by the Beijing Super Cloud Computing Center. Coupled with the country's increased investment in scientific research funds and the requirement to increase the utilization rate of scientific research funds, the general supercomputer cloud service is bound to enter the golden age of development during the 14th Five-Year Plan period, and promote the overall improvement and blossoming of China's independent innovation. (Text/Ningchuan)

Guess you like

Origin blog.csdn.net/achuan2015/article/details/112654401