The first issue of 2024 Feichen book donation event: "Practical AI Large Model"

Insert image description here


Editor's Choice

It was created by Mr. You Yang, a senior expert in the field of artificial intelligence, and was highly recommended by Kaifu Li, Zhou Hongyi, and Yan Shuicheng. Once it was launched, it became Top 1 in JD.com’s “Computer and Internet” book rankings.

"Practical AI Large Model" introduces many contents from basic concepts to practical skills in detail, and comprehensively interprets AI large models step by step, from the shallower to the deeper. The book is equipped with QR code videos, allowing readers to be immersed in the book and quickly and deeply master various experiences and techniques. This book also comes with a wealth of additional resources: open source tools and libraries, datasets and model case studies and practical applications, online communication communities, and more. Readers can make comprehensive use of these resources to gain a richer learning experience and accelerate their own learning and growth.

About "Practical AI Large Model"

Next, I would like to recommend a book on the field of artificial intelligence (AI). The specific information is as follows. In addition, comment "I want to get started with a large practical AI model" in the comment area at the end of this article, and three lucky readers will be selected to receive a paper copy of "A Large Practical AI Model". Deadline: 2024.01.07.

"Practical AI Large Models" is a practical manual designed to bridge the gap between theory and practice in the field of artificial intelligence (AI) (especially AI large models). The book introduces the basic knowledge and key technologies of AI large models, such as Transformer, BERT, ALBERT, T5, GPT series, InstructGPT, ChatGPT, GPT 4, PaLM and visual models, etc., and explains in detail the technical principles and practical implementation of these models. applications and the use of high-performance computing (HPC) technologies such as parallel computing and memory optimization.
At the same time, "Practical AI Large Model" also provides practical cases and details how to use Colossal AI to train various models. Whether you are a beginner in artificial intelligence or an experienced practitioner, you can learn practical knowledge and skills from this book, so as to find a direction that suits you in the rapidly developing field of AI.

Get the "Practical AI Large Model" portal: https://item.jd.com/14281522.html. I personally think this book is very good, especially for AI large model developers, it is a rare book. A great book, worth owning and studying.
Insert image description here

About the Author

You Yang holds a master's degree from Tsinghua University, a Ph.D. from the University of California, Berkeley, and is a Presidential Young Professor of the Department of Computer Science at the National University of Singapore. It has set world records for the training speed of ImageNet, BERT, AlphaFold, and ViT. Related technologies are widely used in technology giants such as Google, Microsoft, Intel, and NVIDIA. In the past three years, he has published more than ten papers as the first author in important international conferences or journals such as NIPS, ICLR, SC, IPDPS, ICS, etc. He once won the Best Award from the International Parallel and Distributed Processing Conference (IPDPS) as the first author. Paper Award (0.8% winning rate) and Best Paper Award (0.3% winning rate) of the International Conference on Parallel Processing (ICPP). He also won the Outstanding Paper Award of the International Conference on Artificial Intelligence (AAAI) as the corresponding author (0.14% winning rate). rate) and the Outstanding Paper Award of the International Conference on Computational Linguistics (ACL) (0.86% award rate), and has published nearly a hundred papers in total. Received the Siebel Scholarship, the highest amount awarded to outstanding graduates of Tsinghua University and the largest computer science department at Tsinghua University at that time, the ACM-IEEE CS George Michael Memorial HPC Fellowship, the only one awarded to doctoral students on the official website of the Association for Computing Machinery (ACM), and awarded to outstanding graduates of Berkeley Lotfi A. Zadeh Prize for Students. He was nominated by UC Berkeley for the ACM Doctoral Dissertation Award. He has worked for Google, Microsoft, Nvidia, Intel, and IBM. In 2021, he was selected into the Forbes Under 30 List (Asia) and won the IEEE-CS Supercomputing Outstanding Newcomer Award.

Table of contents

Chapter 1 AI large models in deep learning
1.1 The rise of AI large models in the field of artificial intelligence
1.1.1 The development and challenges of AI large models
1.1.2 Why AI large models are difficult to train
1.2 Getting started with deep learning frameworks
1.2.1 Building a neural network
1.2.2 Training a text classifier Chapter 2 Distributed systems: the birthplace of large AI models
2.1 Deep learning and distributed systems
2.1.1 From distributed computing to distributed AI systems
2.1.2 Large-scale distributed training platform Key technologies
2.1.3 Colossal AI application practice
2.2 AI large model training method
2.2.1 Gradient accumulation and gradient clipping
2.2.2 Large batch optimizer LARSLAMB
2.2.3 Model accuracy and mixed precision training
2.3 Heterogeneous training
2.3.1 Heterogeneity Basic principles of training
2.3.2 Implementation strategy of heterogeneous training
2.4 Practical distributed training
2.4.1 Colossal AI environment construction
2.4.2 Training the first model using Colossal AI
2.4.3 Heterogeneous training of large AI models Chapter 3 Distribution Training: how thousands of machines dance together
3.1 Basic principles of parallel strategies
3.1.1 Data parallelism: the most basic parallel training paradigm
3.1.2 Tensor parallelism: intra-layer model parallelism
3.1.3 The principle and implementation of pipeline
parallelism 3.2 Advanced parallelism Basic principles of the strategy
3.2.1 Sequence parallelism: ultra-long sequence model training
3.2.2 Hybrid parallelism: extending the model to hundreds of billions of parameters
3.2.3 Automatic parallelism: automated distributed parallel training
3.3 Practical distributed training
3.3.1 Apply model parallelism strategy Practical cases
3.3.2 Training practice combining multiple parallel strategies Chapter 4 Transformer model, the cornerstone of the AI ​​large model era
4.1 Basics of natural language processing
4.1.1 Introduction to natural language tasks
4.1.2 Preprocessing of language input
4.1.3 Sequence to Sequence model
4.2 Detailed explanation of Transformer
4.2.1 Transformer model structure
4.2.2 Attention and self-attention mechanism
4.2.3 Normalization in Transformer
4.3 Variations and extensions of Transformer
4.3.1 Summary of variant models
4.3.2 Transformer sequence position Encoding and processing of information
4.3.3 Transformer training Chapter 5 AI greatly improves the quality of Google search: BERT model
5.1 Detailed explanation of BERT model
5.1.1 Overall architecture and input form of BERT model
5.1.2 BERT model pre-training task
5.1.3 BERT model Application method
5.2 ALBERT model that efficiently reduces memory usage
5.2.1 Parameter reduction method based on parameter sharing
5.2.2 Sentence order prediction (SOP) pre-training task
5.3 Practical training of BERT model
5.3.1 Constructing BERT model
5.3.2 Parallel training of BERT model Chapter 6 T5 model that unifies the natural language processing paradigm
6.1 Detailed explanation of T5 model
6.1.1 T5 model architecture and input and output - text to text
6.1.2 T5 model pre-training
6.1.3 T5 model application prospects and future development
6.2 Unify BERT and GPT's BART model
6.2.1 From BERT, GPT to BART
6.2.2 BART model pre-training
6.2.3 Application of BART model
6.3 The UL2 framework of unified language learning paradigm
6.3.1 A unified perspective on language model pre-training
6.3.2 Combination Hybrid denoisers of different pre-training paradigms
6.3.3 Model performance of UL2
6.4 T5 model pre-training methods and key technologies Chapter 7 GPT series models as the starting point of general artificial intelligence
7.1 The origin of GPT series models
7.1.1 GPT training method and key technologies
7.1.2 Model performance evaluation analysis of GPT
7.2 Detailed explanation of GPT 2 model
7.2.1 Core idea of ​​GPT 2
7.2.2 Model performance of GPT 2
7.3 Detailed explanation of GPT 3 model
7.3.1 Small sample learning, one-time learning and zero-time Similarities and differences in learning
7.3.2 Training methods and key technologies of GPT 3
7.3.3 Model performance and effect evaluation of GPT 3
7.4 GPT 3 model construction and training practice
7.4.1 Building GPT 3 model
7.4.2 Using heterogeneous training to reduce GPT 3 Training consumes resources Chapter 8 The rise of a new generation of artificial intelligence: ChatGPT model
8.1 WebGPT that can interact with the Internet
8.1.1 WebGPT training methods and key technologies
8.1.2 WebGPT model performance evaluation analysis
8.2 InstructGPT model that can interact with humans
8.2 .1 Instruction learning
8.2.2 Proximal policy optimization
8.2.3 Summary of reinforcement learning (RLHF) methods based on human feedback
8.3 ChatGPT and GPT4
8.3.1 Introduction and application of ChatGPT model
8.3.2 GPT 4 model features and applications
8.4 Constructing a conversational system model
8.4.1 Instruction fine-tuning and model training based on supervision
8.4.2 Reasoning and deployment strategies of conversational systems Chapter 9 Natural language models in full bloom: Switch Transfomer and PaLM
9.1 Trillion parameter sparse large model Switch Transformer
9.1.1 Sparse gated hybrid expert model MoE
9.1.2 Trillion parameter model based on MoE Switch Transformer
9.2 PaLM model: optimizing language model performance
9.2.1 The structure, principle and key features of the PaLM model
9.2.2 PaLM training strategy and effect evaluation
9.3 PaLM practical training Chapter 10 ViT model to realize Transformer’s march into computer vision
10.1 Application of Transformer in computer vision
10.1.1 Development background of ViT model in computer vision
10.1.2 ViT model The architecture, principles and key elements
of 10.1.3 Application scenarios and challenges of large-scale ViT models
10.2 Further development of large visual models: the fusion of Transformer and convolution
10.2.1 Improved applications of visual models based on Transformer
10.2.2 Based on convolution Development and optimization of visual model
10.3 ViT model construction and training practice
10.3.1 Key steps and key methods to build ViT model
10.3.2 Multi-dimensional tensor parallel ViT practice drill

Foreword/Preface

Today, the rapid development and widespread application of artificial intelligence technology have attracted public attention and interest. It has not only become the core driving force for technological development, but also promoted all-round changes in social life. In particular, deep learning, as an important branch of AI, has led and defined a technological revolution through continuously refreshing expressiveness. Large-scale deep learning models (referred to as AI large models) have made breakthrough progress in fields such as natural language processing, computer vision, and recommendation systems with their powerful representation capabilities and excellent performance. Especially with the widespread application of large AI models, countless fields have benefited from this.
However, the research and application of large AI models is a complex and difficult exploration. Its challenges and problems in training methods, optimization technologies, computing resources, data quality, security, ethics, etc. need to be dealt with and solved one by one. The above is the original intention and goal of the author in writing this book: I hope that this book can provide a detailed guide and reference for researchers, engineers, scholars, students and other groups, and provide readers with a comprehensive perspective that combines theory and practice, so that They can understand and use large AI models, and they also hope that this book can lead readers to explore more new problems, thereby promoting the continued development of artificial intelligence.
The training of large AI models requires huge computing resources and complex distributed system support. From the perspective of the development process from machine learning to AI large models, only by mastering the basic concepts, classic algorithms and network architecture of deep learning can we better understand and apply AI large models. In addition, distributed training and parallel strategies play a key role in AI large model training and can effectively improve training efficiency and model performance. At the same time, the application of large AI models also involves natural language processing, computer vision and other fields, providing a broader application space for all types of readers.
In order to help readers better understand and apply large AI models, this book introduces many contents from basic concepts to practical techniques in detail. Each chapter focuses on introducing core concepts, key technologies and practical cases. Covers a wide range of content from basic concepts to cutting-edge technologies, including neural networks, Transformer models, BERT models, GPT series models, etc. The book introduces the principles, training methods and application scenarios of each model in detail, and discusses solving challenges and optimization methods in AI large model training. In addition, the book also discusses key technologies such as distributed systems, parallel strategies, and memory optimization, as well as the application of Transformer models in fields such as computer vision and natural language processing. Overall, this book provides a comprehensive perspective to help readers gain an in-depth understanding of the importance and application prospects of AI large models and distributed training in the field of deep learning.

The content of this book is organized as follows.

Chapter 1 introduces the rise, challenges and training difficulties of large AI models, as well as the development history of neural networks and an introductory guide to deep learning frameworks.
Chapter 2 introduces the key technologies of distributed AI systems and large-scale distributed training platforms, as well as the applications of gradient accumulation, gradient clipping, and large-batch optimizers.
Chapter 3 introduces the methods of data parallelism and tensor parallelism to process large-scale data and tensor data in a distributed environment, as well as the hybrid parallel strategy to improve the distributed training effect.
Chapter 4 introduces the structure of the Transformer model and the implementation of the self-attention mechanism, and discusses common tasks in natural language processing and the application of the Transformer model in text processing.
Chapter 5 introduces the architecture and pre-training tasks of the BERT model, as well as methods of leveraging parameter sharing and sentence order prediction to optimize model performance and reduce memory usage.
Chapter 6 introduces the architecture, pre-training methods and key technologies of the T5 model, a unified perspective on pre-training tasks and the application of hybrid denoisers that combine different pre-training paradigms.
Chapter 7 introduces the origin, training methods and key technologies of the GPT series models, as well as the core ideas, model performance and effect evaluation of the GPT2 and GPT3 models.
Chapter 8 introduces the ChatGPT and InstructGPT models that can interact with the Internet and humans, as well as the applications of the ChatGPT model and the characteristics and applications of the GPT4 model.
Chapter 9 introduces the sparse gated hybrid expert model and the MoE-based Switch Transformer model, as well as the structure, training strategy and effect evaluation of the PaLM model.
Chapter 10 introduces the application and performance of the ViT model in computer vision, as well as the application prospects of Transformer in tasks such as image classification, target detection, and image generation.
Whether it is BERT, GPT, or PaLM, each model is the crystallization of the evolution of artificial intelligence technology, and contains a profound theoretical foundation and practical experience behind it. This is why this book has chosen to discuss each model separately, to ensure adequate coverage of the depth and breadth of each model. This book also provides a comprehensive introduction to the technologies required to train these models: from high-performance computing (HPC) to parallel processing, from large-scale optimization methods to memory optimization, each technology is carefully selected and studied in depth. Yes, they are the cornerstone of large AI model training and the key to building high-performance AI systems.
However, mastering theoretical knowledge is only the starting point for understanding large models. The practical application of AI requires solving a series of challenges in large AI model training, such as management of computing resources and optimization of training efficiency. This leads to a part of the book that is particularly emphasized - Colossal AI.
By using Colossal AI, this book provides a series of practical content, including how to train BERT, GPT 3, PaLM, ViT and conversational systems step by step. These practical contents not only introduce the specific steps of model training, but also provide an in-depth analysis of the key technologies and advantages of Colossal AI, helping readers understand how to use this powerful tool to improve their research and
work. Finally, the book designs a series of practical exercises aimed at transforming theory into practice. This kind of design is also in line with the experience of "practice makes true knowledge" in programming learning. Only by truly hands-on operation can we truly understand and master the principles behind these complex AI large models.
This book is intended for readers interested in the fields of deep learning and artificial intelligence. Whether students, researchers, or practitioners, you can gain valuable knowledge and insights from this book. For beginners, this book provides the basic concepts and algorithms of deep learning and AI large models to help them establish the necessary knowledge framework; for readers with certain experience, this book deeply discusses the key technologies and techniques of large models and distributed training. challenges so that they can gain an in-depth understanding of the latest research advances and practical applications.
This book provides a wealth of resources to help readers better understand and apply what they have learned. The content in the book has been carefully arranged and organized by the author, and is systematic and coherent. Readers can obtain a clear knowledge structure and learning path. At the same time, the book also provides a large number of code examples and practical cases, so readers can consolidate the concepts and
techniques they have learned through practical operations. In addition, the book also provides references for further study to help readers delve deeper into topics of interest. In addition, this book also comes with a wealth of additional resources, designed to further attract readers to continue their own exploration and learning beyond the knowledge in the book.

A quick overview of the book "Practical AI Large Model"

Insert image description here
Get the "Practical AI Large Model" portal: https://item.jd.com/14281522.html. I personally think this book is very good, especially for AI large model developers, it is a rare book. A great book, worth owning and studying.

Conclusion

In today's AI era, deep learning models have become an important engine for promoting the development of artificial intelligence. However, how to efficiently deploy deep learning models into practical applications has always been a challenge faced by the industry. "Practical AI Large Models: Deep Learning Model Compilation and Optimization" will reveal the secrets of AI large models and help you achieve breakthroughs in the field of deep learning.

As a powerful deep learning model compilation tool, AI large model can optimize the model into efficient machine code, significantly improving the model's operating efficiency and power consumption efficiency. By in-depth understanding of its principles and practices, you can quickly develop and optimize deep learning models to provide strong support for practical applications.

In the future development of AI, TVM is expected to become an important tool in the field of deep learning. As an open source machine learning compiler stack, TVM is committed to improving the operating efficiency of deep learning models and making great contributions to the development of artificial intelligence. For those who are in the field of artificial intelligence or are about to join this field, mastering TVM compiler technology is the key to keeping up with the pace of technological development, and it is also an essential skill to avoid being eliminated by the times.

By studying "Practical AI Large Models: Deep Learning Model Compilation and Optimization", you will master the compilation principles, optimization techniques and practical application cases of AI large models. This book will give you an in-depth understanding of the working mechanism of TVM and help you apply TVM technology in actual projects to improve the performance of deep learning models. In the wave of artificial intelligence, let us jointly explore the infinite possibilities of large AI models and lay a solid foundation for the future development of artificial intelligence technology.

Hello, I am Feichen.
Welcome to follow me to obtain front-end learning resources, daily sharing of technological changes and survival rules; industry insiders and insights.

Guess you like

Origin blog.csdn.net/weixin_48998573/article/details/135379158