Shanghai Artificial Intelligence Laboratory AI Training and Computing Center｜Global Recruitment

Shanghai Artificial Intelligence Laboratory (Shanghai AI Laboratory) AI Training and Computing Center is committed to defining and leading the construction of a new generation of AI training and computing systems, building a broadly inclusive AI computing and compiling system, and developing highly scalable, widely open, and optimized An energy-efficient and highly adaptable system undertakes the cornerstone work of supporting artificial intelligence technology.

Facing the future development of general intelligence and interdisciplinary subjects, the center will create a new generation of open artificial intelligence training system, define a new training system programming paradigm, break through multi-dimensional parallel technology, adaptive training technology and efficient distributed collaboration in a multi-device environment technology to achieve efficient training of super-large models, explore the energy efficiency limit of model training, and meet the needs of the integration of general intelligence and super-large-scale scientific computing.

At present, the center has launched the DeepLink artificial intelligence open computing system, the NeRF model training of 100 billion parameters of LandMark and the real-time rendering of the 1K resolution model of 100 square kilometers, and the efficient overlapping of computing and communication of Shusheng Puyu (InternLM). Results, and won the ASPLOS 2023 Outstanding Paper Award.

The center will continue to explore the open artificial intelligence computing system, research and make breakthroughs in computing compilation technology for multi-computing backends, promote the simultaneous evolution of computing capabilities and algorithm development, and realize the efficient adaptation and training support of various computing units in the cloud and edge , to meet the new computing needs brought by the new generation of intelligent computing infrastructure.

We are now recruiting talents for the following positions:

Young Scientist of AI Training and Computing Center
High-performance Heterogeneous Computing Young Scientist
Young Researcher of Large Model Training Technology
R&D Engineer of Large Model Training System
Deep Learning Training Framework Algorithm Engineer
Deep Learning Compilation Research Engineer
C++ development engineer
Multimodal Large Model Young Researcher
Multimodal Mockup Engineer
Natural Language Processing Large Model Young Researcher
Natural Language Processing Large Model Engineer

(Swipe up and down to view the specific content of relevant positions)

Young Scientist of AI Training and Computing Center

job description:

1. Lead the team to promote the optimization of computing and communication in distributed training, make full use of computing power, and use a large number of accelerator cards to explore the performance boundary; 2.
Optimize the use mode of the accelerator chip memory and communication, and support efficient training of various large models;
3. Cooperate with universities to study large-scale system research work in the system field;
4. Improve the influence of distributed training direction in academic and industry.

Job Requirements:

1. Solid computer foundation, familiar with C/C++, capable of system software development architecture;
2. Familiar with computer architecture and basic parallel computing technology, more than 5 years of working experience in large-scale distributed training;
3. With NVIDIA, AMD, Intel Experience in performance tuning of at least one GPU architecture;
4. Familiar with at least one distributed communication library of MPI and NCCL, and experience in performance tuning of 100-node computing tasks;

5. Familiar with the principles of model training and the basic principles of Optimizer, understand the basic methods of distributed training, and have some understanding of training acceleration methods such as mixed precision training and data parallelism are preferred.

High-performance Heterogeneous Computing Young Scientist

job description:

1. Lead the team to build an AI training and computing software ecology with compilation as the core, and cooperate with AI acceleration chip manufacturers to form software and hardware interface standards; 2.
Design an open AI compilation software stack architecture to connect with various AI acceleration chips;
3. Focus on AI computing& Compile the development trend of cutting-edge technology, and lay out the research direction of the laboratory's compilation technology;
4. Continue to improve the academic and industrial influence of the laboratory's compilation and computing direction.

Job Requirements:
1. More than 5 years of work experience, outstanding doctors can appropriately relax the working years requirement;
2. Solid computer foundation, familiar with C/C++, and capable of system software development architecture;
3. Familiar with computer architecture and basic parallel computing technology;
4. Experience in performance tuning of at least one GPU architecture such as NVIDIA, AMD, Intel, etc.;
5. Familiar with at least one development primitive such as CUDA, OpenCL, Vulkan, Metal, OpenGL Compute Shader, etc.;
6. Have algebraic matrix operations , signal processing, computer vision, image processing or 3D graphics algorithm porting and tuning experience on different processors is preferred;

7. Those who have developed and operated open source software or contributed code to well-known open source software are preferred.

Young Researcher of Large Model Training Technology

Job description:
1. Track the latest progress in generative AI research;
2. Reproduce the classic work of generative AI;
3. Discover new optimization opportunities in generative AI training and reasoning.

Requirements:
1. A doctorate degree in computer or artificial intelligence;
2. Experience in framework and system structure;
3. Senior engineering experience;

4. Those who have research achievements in AI system software are preferred.

R&D Engineer of Large Model Training System

job description:

1. Participate in the design and implementation of highly available, scalable, and distributed machine learning systems, support efficient training and reasoning of large models, and achieve technological breakthroughs;

2. Optimize the distributed system of large model training scenarios;

3. Continue to improve the utilization efficiency and ease of use of the platform, explore the industry's leading large-scale model-related technologies, design and implement them in the training system.

Job Requirements:

1. Master degree or above in computer related majors;

2. Solid programming foundation, familiar with multi-threaded programming, network communication, memory management and design patterns, have development experience in large-scale C++/Python system engineering, and have the ability and experience to optimize distributed system performance problems;

3. Have a technical enthusiasm for AI systems, have a strong interest in and pursuit of cutting-edge technological breakthroughs, and be keen on pursuing technological perfection and innovation;

Those who meet one or more of the following conditions are preferred:

a. Experience in programming and performance tuning for CUDA, NCCL, RDMA communication;

b. Familiar with the source code of mainstream deep learning frameworks such as Pytorch or Ray.

Deep Learning Training Framework Algorithm Engineer

job description:

1. Responsible for the quantitative analysis, transplantation, implementation, end-to-end performance optimization and precision tuning of the laboratory AI workload/algorithm model on the Cambrian AI chip (MLU);

2. Under the MLU software and hardware platform, develop training, reasoning, and machine vision algorithm libraries; analyze possible performance bottlenecks in operators, and optimize them to improve the competitiveness of products in the industry; including but not limited to the needs of operators Analysis, design, development and unit testing, optimization, integration and version maintenance.

Job Requirements:

1. Computer, electronic engineering, mathematics, communication, automation and other related majors;

2. Proficiency in operating Linux system, proficiency in C/C++/Python/Shell, good programming habits, and familiarity with software development process;

3. Have a strong interest in a certain scene in computer vision, speech recognition, search advertisement recommendation, natural language processing, and AI HPC;

4. Familiar with and have used at least one mainstream deep learning framework (TensorFlow/PyTorch/TVM, etc.), have a good teamwork spirit, have a strong sense of responsibility, and be able to actively complete related work.

5. Those who have the following experience are preferred:

a. Have a developer's understanding of at least one mainstream deep learning framework (TensorFlow/PyTorch, etc.), and have certain insights and experience in framework design or tuning are preferred;

b. Have certain insights and experience in low-level high-performance distributed solutions such as MPI/CCL, as well as high-level distributed solutions such as data, models, and pipeline parallelism;

c. Familiar with computer architecture, experience in parallel computing, heterogeneous computing, and performance optimization on (GPU, TPU, X86, ARM, DSP);

d. Have experience in development and performance tuning of high-performance libraries commonly used in the industry (such as TensorRT, OpenBLAS, MKL, cuDNN, etc.).

Deep Learning Compilation Research Engineer

Job description:
1. Participate in the development of domestic chip access deep learning framework;
2. Undertake the task of abstract compiler access framework to improve access efficiency;
3. Use various technical means to speed up model training;
4. Participate in other Research and application of relevant cutting-edge technologies.

Job Requirements:
1. Have good programming habits, proficient use of Python/C++ language, and strong debugging ability;
2. Have a deep understanding and practical experience in deep learning frameworks or deep learning compilers (including but not limited to Pytorch, TF , JAX, XLA, MLIR, TVM, etc.);
3. Possess strong self-motivation and strong interest in advanced technology;
4. Have strong team spirit and communication skills;

5. Familiar with the new features of Pytorch2.0, experience in large model training optimization or AI for Science is preferred.

C++ development engineer

Job description:
1. Participate in the research and development of specific frameworks, realize the implementation of new methods in the field of deep learning, and understand new technologies in related fields; 2.
Participate in the development and optimization of the deep learning framework Parrots, and realize deep learning under the conditions of big data industrial-level applications Core architecture optimization;
3. Adapt to the latest algorithm and architecture of deep learning, adjust, improve and optimize the framework;
4. Optimize the training speed of the framework, including calculation and communication and its scheduling, to improve the efficiency of model training;
5. Expansion Improve the functions of the deep learning framework and improve the computing power, and improve the tool system.

Job requirements:
1. Any one of the following requirements must be met:
a. More than 2 years of experience in C++ or Python development under Linux, proficiency in template programming, experience in open source code contribution is preferred; b
. Familiarity with computer architecture and basic parallel computing technologies, And the basic principles of GPU parallel computing, with more than 2 years of GPU programming experience;
2. Solid computer science foundation and programming ability, skilled use of common algorithms and data structures, good programming habits and code style; 3.
Good Documentation habit, write technical documents and work progress in a timely manner as required;
4. Familiar with the source code of mainstream deep learning frameworks such as Pytorch and Tensorflow is preferred.

Multimodal Large Model Young Researcher

job description:

1. Directly participate in the research and development of multi-modal large models, including design, training, tuning, etc. of multi-modal large models; 2. Carry out research on multi-modal
cutting-edge algorithms, including but not limited to multi-modal 2D/3D perception , graphic generation, etc.;
3. Carry out basic theoretical research on large model training, including but not limited to model design, training strategies, optimization algorithms, model compression, etc.

Job Requirements:
1. A doctorate degree in computer or artificial intelligence related majors, with multiple papers published in top journals;
2. In-depth understanding of at least one direction of natural language processing or computer vision;
3. Proficiency in Python and PyTorch, with relatively Strong engineering ability.

4. Experience in multimodal algorithm research and large model pre-training is preferred;

5. Those who have well-known academic work, open source projects, and international competition results are preferred;
6. Those who are familiar with large model training frameworks such as Deepspeed, Colossalai, or Megatron are preferred.

Multimodal Mockup Engineer

job description:

1. Participate in the training and tuning of multi-modal large models, stabilize large model training, improve training efficiency, etc.;

2. Support various landing applications of large models, and apply large models to real scenes;

3. Participate in the development of large model evaluation.

Job Requirements:

1. Bachelor degree or above in computer or artificial intelligence related majors, extra points for papers with top conferences;

2. Familiar with at least one research direction of natural language processing or computer vision;

3. Proficient in Python and PyTorch, with strong engineering ability;

4. Familiarity with CUDA development and performance tuning can add points;

5. Familiar with large model training frameworks such as Deepspeed, Colossalai or Megatron, experience in large model pre-training is preferred;

6. Those who have well-known open source projects and international competition results are preferred.

Natural Language Processing Large Model Young Researcher

job description:

1. Participate in large-scale model research, including training and tuning of super-large-scale models;
2. Carry out large-scale model mechanism and optimization research, including large-scale model capability exploration, large-scale model capability external expansion, etc.; 3.
Responsible for large-scale model-related peripheral research , such as large model ethical security research, large model reasoning acceleration, large model prompt optimization, etc.

Job Requirements:
1. A doctorate degree in computer or artificial intelligence related majors, with multiple papers published in top journals;
2. In-depth understanding of at least one research direction in natural language processing or computer vision;
3. Proficiency in Python and PyTorch, with Strong engineering ability.

4. Familiarity with large model training frameworks such as Deepspeed, Colossalai or Megatron is preferred;

5. Those with well-known academic work, open source projects, and international competition results are preferred.

Natural Language Processing Large Model Engineer

job description:

1. Participate in the training and tuning of large models, stabilize large model training, improve training efficiency, etc.; 2.
Responsible for the work related to the Intern-LM algorithm library, implement and maintain the algorithm library;
3. Support various landing applications of large models , to shorten the distance between the large model and the real application scene.

Job Requirements:
1. Bachelor degree or above in computer or artificial intelligence related majors, extra points for papers with top conferences;
2. Familiar with common models of natural language processing;
3. Proficient in Python and PyTorch, with strong engineering capabilities.

4. Familiarity with CUDA development and performance tuning can add points;

5. Familiarity with large model training frameworks such as Deepspeed, Colossalai or Megatron is preferred;

6. Those who have well-known academic work, open source projects, and international competition results are preferred.

This recruitment is open to social recruitment, school recruitment, and internship positions at the same time. Those who are interested in this field are welcome to submit.

delivery method

Method 1:
Send your resume to email: [email protected], email and resume naming format: name - job title applied for - (school recruitment/social recruitment/internship).

Method 2:
Log in to the official website of Shanghai Artificial Intelligence Laboratory (www.shlab.org.cn), click "Join Us" in the navigation bar, search for the corresponding post name and submit.

Method 3:

Scan the QR code below to submit or click "Read the original text" at the end of the article to deliver.

Shanghai Artificial Intelligence Laboratory (Shanghai AI Laboratory)

A new scientific research institution in the field of artificial intelligence in my country, carrying out strategic, original, and forward-looking scientific research and technological breakthroughs, breaking through important basic theories and key core technologies of artificial intelligence, and creating a "breakthrough, leading, and platform" integration It is a large-scale comprehensive research base that supports my country's artificial intelligence industry to achieve leapfrog development, and aims to build a world-class artificial intelligence laboratory and become the source of world-renowned original artificial intelligence theories and technologies.

Shanghai Artificial Intelligence Laboratory AI Training and Computing Center｜Global Recruitment

Guess you like