Break through the storage bottleneck and get through the "last mile" of high-performance computing

What is the core of high-performance computing?

Needless to say, it must be calculated. Yes, many people think so.

Insert picture description here

Looking at the TOP500 list twice a year, you will find that today's strongest supercomputer has been able to achieve a calculation efficiency of 4.42 billion times per second (442010 TFlop/s), and is further striding towards the goal of tens of billions of times.

As more and more applications of heterogeneous computing, including traditional HPC and emerging artificial intelligence, have begun to adopt heterogeneous computing methods on a large scale, the emergence of many architecture chips including GPU, FPGA, ARM, etc., has also brought the entire computing market Presenting a situation of "a hundred flowers blooming".

The same situation also occurs at the network level. Unlike many people’s impression that HPC uses the InfiniBand network, the information from the TOP500 list shows that Ethernet still has a very high share. 100G network is already the “standard configuration” of Ethernet; and InfiniBand, which focuses on transmission efficiency and low latency In the network, the 200G HDR standard has become the mainstream, and even Mellanox announced the 400G NDR product at the SC20 conference not long ago, and its scope involves the entire field of network applications, including network cables.

Insert picture description here

From this perspective, the development of high-performance computing can be said to be changing with each passing day. The changes from computing to the network have made data processing and transmission more efficient, but in such a rapid development state, we seem to have forgotten another important option— -storage.

How important is storage for high-performance computing?

In the past, when talking about high-performance computing, we always cared about the speed of computing, because there were obvious shortcomings in computing power at that time; but now, the emergence of heterogeneous computing makes computing efficiency increase exponentially, and high-speed networks also allow These calculation results allow the data itself to exert greater value...

But at this time we discovered that storage has become a bottleneck in many applications.

Insert picture description here

For example, biological genetic engineering, one of the traditional applications of high-performance computing. I remember that since the 1990s, TV and books have been overwhelming with "Human Genome Project", "cloning", "biochip" and so on. Even the then Vice President of Peking University, Dr. Chen Zhangliang, even said that famous saying. -"The 21st century is the century of biology". Especially in recent years, the discovery of PD-1 signaling pathway (making humans a big step closer to the road to defeating cancer), induced pluripotent stem cell iPSC technology and gene editing technology CRISPR (making type I diabetes is expected to be cured), etc. Scientific research results have proved the importance of bioengineering.

Unlike most people’s impressions of biological laboratories, with the blessing of high technology, the scenes of test tubes and beakers in film and television dramas have gradually been replaced by high-performance computers. An important part of.

Take BGI Gene, a well-known domestic gene company, as an example. This company, which provides a major contribution to the development of global genes, has hundreds of sequencers, and the resulting monthly data is as high as 300TB-1PB. As a result, just storing these data is a headache, not to mention the need for subsequent analysis and utilization of the data. The storage resources used in this can be described as "astronomical".

Insert picture description here

Another thing that relies heavily on storage is video editing and processing. In fact, with the spring breeze of short video applications, many people can now shoot their own videos on platforms such as Douyin and Kuaishou, which also allows more people to be exposed to how much storage resources video processing is—4K or even 8K. While the ultra-high resolution of video brings us a visual feast, it also puts pressure on the back-end storage devices.

According to reports, the 2016 Rio Olympics used 8K video to report events, recording 20 minutes of uncompressed ultra-high-definition video, occupying a full 4TB of storage space.

The problem of capacity can be solved by adding storage devices, but more importantly, storage efficiency. In order to meet the 4K/8K video storage workflow, in addition to the need for large enough storage space, its equipment must have high scalability and high performance to meet the read and write efficiency, and this gives 4K/8K The storage brings greater difficulty.

Similarly, in the film industry, from "The Return of the Great Sage" to "The Wandering Earth", and then to "King Kong River" broadcast after the National Day last year, the special effects of visual impact show that rendering has begun to help the development of China's film industry. It brings great changes to the production cycle, production cost and convenience of production of film and television works.

Special effects rendering is an important scene for the commercialization of high-performance computing. The time spent in making a special effects movie is not only spent on shooting, but also on waiting for special effects. The main time for making special effects is spent on rendering. Before rendering, capacity is required. Large storage devices store data, and during the rendering process, a lot of data processing work is required to generate the final output, that is, high-performance storage is required. In a sense, computing power, storage, and cloud rendering are becoming important in the Chinese film industry. The component of the Chinese film industry has promoted the rapid development of China’s film industry.

Business scenarios are different, and storage requirements are also different. Both the scientific research field and the new smart video have brought new requirements and challenges to storage devices, and also require breakthroughs in storage products themselves.

It is precisely to see the gradual increase in the importance of storage in high-performance computing. Since November 2017, in addition to the famous TOP500 list, a set of names will be released at the SC and ISC conferences in the United States every year. As the IO500 list, this list is gradually becoming the performance "vane" of the storage industry.

What should the storage platform of the future look like?

If TOP500 is a ranking of computing performance, then IO500 is a ranking of storage system performance.

For a long time, benchmark performance testing of performance storage systems is a complex task. Parallel I/O is not only affected by CPU latency and network, but also by underlying storage technology and software. The performance test results released by different manufacturers often have great differences due to the different sequence of test methods, tools, parameters and even test steps.

So IO500 appeared. As an international authoritative evaluation standard, IO500 defines a comprehensive benchmark performance test suite that can perform standard tests and comparisons on high-performance storage systems, aiming to provide users with a standard evaluation basis.

Specifically, the standard IO500 test benchmark uses IOR, MDTEST, and standard POSIX to evaluate the performance of optimized sequential IO, random IO, and metadata operations.

IO500 includes two benchmark tests of bandwidth and metadata. After geometrically averaging the total scores of the two items, the final score is obtained.

In terms of expression, these results are divided into two categories: the overall list and the 10-node list. Among them, the 10-node list will be closer to the scale that the actual parallel program can achieve, and it can better reflect the actual storage system. The I/O performance provided by the program has a higher reference value.

We know that the vast majority of high-performance computing applications today are designed based on the POSIX protocol. Therefore, the standard POSIX interface used by IO500 can also maximize storage and storage requirements for high bandwidth, high throughput, and low latency.

In other words, IO500 can reflect the difference in system storage performance close to real applications. After all, the real model of HPC is complex and requires storage with high bandwidth, OPS fusion, protocol fusion and other capabilities at the same time, because it can enter IO500 and even be on the list. The one who gets the lead must be the leader in the storage field.

Therefore, regardless of the launch time is not long, but in terms of the enthusiasm of participants and the intensity of competition, IO500 is even worse than TOP500. The independence of the storage system from the computing system, based on RUST’s innovative optimization methods such as highly scalable concurrent access, large-granularity data caching & bypass access, data access & disk streaming, zero-copy extremely fast RPC processing technology, etc., in NVME- The performance is maximized with the support of SSD.

Looking at the current storage market, the coexistence of multiple storage protocols has always been a problem that plagues users in upgrading and iteration. Compared with traditional file storage and block storage, today's popular object storage is more suitable for the needs of unstructured data and intelligent applications. Even in high-performance computing storage applications where the POSIX protocol is dominant, object storage can also rely on convenience The operation occupies a place.

For example, the video special effects rendering industry uses object storage to store massive amounts of video material, which is convenient for later real-time query and recall of data. Therefore, for users, who can achieve the coexistence, effective management, and convenient data processing of multiple storage protocols will be able to take the lead in future storage development.

In addition to multi-protocol coexistence, multi-protocol interoperability is also a concern of the entire industry. Commonly used distributed parallel file systems include lustre, gpfs, gluster, isilon onefs, etc., and object storage is ceph and cleversafe. In addition, it also involves SMB, CIFS, NFS and other data access protocols; for example, NAS is common Typical of multi-protocol intercommunication.

The biological genetic testing mentioned in the previous article has now been able to achieve intelligent testing standards with the help of big data and artificial intelligence, and the entire industry has also shown a personalization trend. But it also brings about data interoperability issues, such as how the data of the distributed architecture can work with traditional sequencers or workstations.

And now the most popular intelligent driving is like this. A few days ago, Tesla CEO Elon Musk mentioned in an interview program that L5 intelligent driving will be realized in 2022. This also means that vehicles have completely replaced drivers, realizing the true meaning of what we are in film and television dramas. The kind of "autonomous driving" you see.

Autopilot is divided into L1-L5 different levels. Under L5, the future car will be transformed from a car into a cockpit. Any condition can realize intelligent computer control of the car. Of course, the owner can also operate.

But this must solve the problem of data synergy. For a long time, autonomous driving requires sensors on road mining vehicles to collect massive road test data, and perform repeated AI training and simulation based on the processed data, so that the car can intelligently identify and process various road conditions and obstacles. So as to realize automatic driving.

This includes data collection and import, data preprocessing in localization, AI model training and finally HPC application simulation, and simulation to guide the vehicle AI system to make intelligent judgments and promote program upgrades and iterations.

Insert picture description here

                图注:自动驾驶研发流程

The access protocols used in different stages of autonomous driving training are also very different. To improve the efficiency of process processing, storage supports the intercommunication of multiple protocols and reduces data copying. Take the current mainstream L2-L3 autonomous driving as an example. The amount of data generated per car per day has reached 2-64TB, and with the increase in car mileage, the amount of data generated has reached the scale of PB or even EB.

In L5 autonomous driving, the amount of data transmission and calculation can even reach an "astronomical number", which requires an integrated calculation method including on-board processors and back-end data centers to achieve rapid speed through the Internet of Vehicles and 5G networks. Transmission and response.

In other words, in the application scenario of autonomous driving, the on-board system will perform large-scale data calculation and processing, which has both high bandwidth performance requirements and high OPS and low latency performance requirements. Collaborative interaction across systems appears.

This also requires the future high-performance computing system to not only meet the requirements of high bandwidth & high OPS, but also to avoid data copy redundancy caused by the use of different architectures in different process stages. Therefore, multi-protocol interoperability has become an inevitable choice. .

From this point of view, in order to improve the storage performance of high-performance computing, it is necessary to break the original boundary, so that the platform can accept users' various forms of data, and realize data flow under multiple protocols, so as to achieve unified storage resource management and break With hardware locking, data can be evenly distributed in the resource pool, simplifying the design of durable protection, different business systems can share storage, and storage costs can be reduced through intensification, avoiding repeated construction of multiple systems, and fully exploring the potential value of existing data.

For a long time, we have faced many challenges on the road to tens of billions of calculations. The memory access wall problem, the communication wall problem, the reliability problem and the energy wall problem have become four major problems. Among them, the "access wall" The problem is that the speed of calculation, storage, and I/O needs to be matched and balanced, so as to achieve performance balance in the design of the architecture.

Therefore, future high-performance computing storage needs to achieve multi-protocol compatibility, multi-protocol architecture interconnection, and intelligent management.

In other words, with the development of high-performance computing to this day, the rapid increase in computing power and the high bandwidth and low latency of network data have entered a new stage. In line with this, storage applications have also achieved massive scalability and intelligent data management. .

For traditional high-performance computing applications, a variety of scientific research applications such as biological genes, high-energy physics or fluid mechanics, video processing, etc. will use massive amounts of data, and increasing the storage space and utilization of data can better improve The development of performance computing.

It's time to choose a new storage platform for the exascale computing era. (The article is transferred from the DT era, thanks)

Guess you like

Origin blog.csdn.net/inspur_cloud/article/details/112916879