How to get started with high performance computing?

If you don't have a professional direction when you get started, it is not easy to have your own core competitiveness if you learn by yourself. At present, the ecology in China is not perfect, and there are relatively few learning materials~ If you want to learn systematically, you must not miss the first systematic learning course in China that specializes in high-performance computing.

~We are the first in China to train and recommend high-performance computing talents. The instructors are from domestic first-class scientific research institutes. The courses and practice system are developed by the former Tianhe team instructors. The cooperative units cover domestic supercomputing centers, chip companies and Internet companies. , welcome private message if necessary



We share the public welfare of the 2022 courses here, you can watch the children's shoes you need

It costs nearly 2W to purchase a full set of parallel computing/supercomputing systems from entry to advanced courses, a full set of information is free to share, and the former Tianhe team instructor will give lectures! Hardcore all the way, regret if you miss it_哔哩哔哩_bilibili

Next answer the question~

In a broad sense, high-performance computing is not limited to the use of high-performance computers to accelerate computing. From the underlying hardware architecture, operating system, middleware, to parallel programming models, to upper-level applications, high-performance Calculated within the domain of interest.

From my personal understanding, the classification of personnel engaged in high-performance computing can be roughly divided into two categories: one is high-performance computing application research and development, and the other is high-performance computing system operation and maintenance .

To be qualified for high-performance computing research and development positions, the following relevant knowledge is required:

1: Basic operation of Linux system, learning common commands, vim, gcc compiler, etc.

2: Proficiency in programming in languages ​​such as C/C++/Fortran/python, preferably in C.

3: Compile and install common software, learn how to use gmake/make/cmake and other compilation tools.

4: If you do multi-core CPU application development, you need to learn parallel programming models such as MPI and OpenMP.

In-depth optimization needs to be combined with the corresponding architecture, and it is necessary to learn knowledge related to computer architecture, such as instruction set, pipeline, registers, cache, and SIMD.

5: If you want to develop heterogeneous programs , you need to learn heterogeneous programming models, such as CUDA, OpenCL or OpenACC. It is also necessary to learn the architecture of heterogeneous acceleration hardware, such as GPU, DSP or Intel MIC, etc.

6: Professional background knowledge and algorithms in the corresponding field, proficient in algorithms can make you go further in the field of high-performance computing application research and development.

The operation and maintenance of high-performance computing systems involves more extensive content:

From ordinary servers, to clusters, to cloud computing, etc., although the threshold required by the operation and maintenance industry is low, the content that needs to be learned is more complicated. Taking cluster operation and maintenance management as an example, the knowledge to be learned is:

1: Basic operation and maintenance management of Linux system, such as system installation of different versions, user management, network configuration, authority control and security control, etc.

2: Cluster user management and directory collaborative management, such as LDAP configuration and use.

3: Network operation and maintenance , including the configuration and management of common switches. Common high-speed network operation and maintenance management, such as IB network, OPA network configuration and use, etc.

4: Operation and maintenance management of parallel file systems, such as the configuration and management of file systems such as lustre or bgfs.

5: The operation and maintenance management of the job scheduling system, commonly used are slurm and pbs, it is recommended to learn the configuration and management of slurm.

6: Software deployment and installation, learn common compilation tools gmake/make/cmake, etc.

7: Rapid deployment and construction of the environment, such as the use of conda. Container technology, such as docker or singularity .

GPU parallel computing is mainly divided into two parts, one is graphics computing, and the other is general computing.

Graphics computing is mainly for image processing, such as movie screen rendering, game screen rendering, etc. This type of GPU is usually called a graphics card. To be an application engineer in this area, you need to master the knowledge of computational graphics, such as graphics hardware, graphics standards, graphics interaction technology , raster graphics generation algorithms, image textures, etc., and you need to master the related usage methods and optimization methods of general graphics computing APIs. Such as OpenGL, Vulkan, GLSL, etc.

General computing is mainly used to accelerate research in basic disciplines, such as theoretical physics, chemical materials, metal processing, bioinformatics, aerospace, defense industry, deep learning, artificial intelligence, etc. To do this kind of work, you need to have relevant professional knowledge background, basic algorithms and common GPU development environment, such as CUDA, Rocm, OpenCL and other usage methods, and of course you also need to be familiar with GPU hardware architecture.

Now there is a trend of great unification of graphics computing and general computing, and it is recommended to pay equal attention to both. But there is more to learn.

Now, Yuancode Technology ( http://www.ydma.com ), as a domestic online education & Internet technology company with a sense of responsibility and mission, is the first to respond to the supercomputing Internet supercomputing proposed by the Ministry of Science and Technology in April 2023 / An enterprise that cultivates high-performance computing talents.

Ape Code Technology ( http://www.ydma.com ) took the lead in developing systematic, practical, and practical high-performance computing courses, covering CPU parallel program optimization , GPU parallel program optimization, and supercomputing operation and maintenance. Programmers and college students provide practical and competitive skills training.

The high-performance computing course of Ape Code Technology ( http://www.ydma.com ) has the following four advantages:

1. The mentor group of well-known domestic supercomputing experts provides close guidance, and the guidance of famous teachers is better than half a year of self-exploration

2. Real project practice on the Tianhe supercomputing practice platform worth over 1 billion yuan

3. Task-based and interactive practical training

4. 6-hour practical parallel programming marathon assessment.

At present, the globally recognized learning method is task-oriented learning. Ape Code Technology ( http://www.ydma.com ) helps students achieve success in learning and test through deliberate training and a result-oriented integrated learning system of learning, training and testing. Get the job done.

 

Guess you like

Origin blog.csdn.net/YDM6211/article/details/131434255