DeepSpeed combined with Megatron-LM training GPT2 model notes (on) - Code World

DeepSpeed combined with Megatron-LM training GPT2 model notes (on)

News 2023-09-07 01:34:30 views: null

NoSuchKey

Guess you like

Origin blog.csdn.net/just_sort/article/details/131173500

DeepSpeed combined with Megatron-LM training GPT2 model notes (on)

[DeepSpeed Tutorial Translation] II, Megatron-LM GPT2, Zero Redundancy Optimizer and ZeRO-Offload

PTM: Introduction to large model acceleration methods or frameworks (pre-training stage/inference stage), commonly used frameworks (Megatron-LM/Colossal-AI/DeepSpeed, etc., FastLLM/vLLM, etc.), detailed strategies for case applications

DeepSpeed accelerates large model training

GPT series training and deployment - GPT2 environment configuration and model training

A detailed understanding of the GPT2 model structure and its training process—GPT series training and deployment

Custom model and data for DeepSpeed-Chat training

[BBuf's cuda study notes ten] Megatron-LM's gradient_accumulation_fusion optimization

About the training verification generated by gpt2

U-Net combined with GAN model training

[Deep Learning] Framework for Large Model Training--Use of DeepSpeed

DeepSpeed: Large model training framework | JD Cloud technical team

Deep learning: Large-scale model distributed training framework DeepSpeed

Pre-training of large language models [2]: GPT, GPT2, GPT3, GPT3.5, GPT4 related theoretical knowledge and model implementation, model application and detailed explanation of the differences between versions

Notes - Model Training: Monitoring

[Paper notes] chatgpt series 2.3 DeepSpeed-chat SFT training

Notes - model training: Regular Loss

[NLP] In-depth understanding of Megatron-LM

Notes - model training: Save reading using the model

ColossalAI GPT2 distributed training debugging configuration - GPT series training and deployment

ZeRO & DeepSpeed: allows training model has more than 100 billion parameter optimization (Microsoft)

[Translation] DeepSpeed: A very large-scale model training tool that everyone can use

Popular understanding of Megatron-DeepSpeed: the technology behind the 100 billion parameter model BLOOM

[LLM] DeepSpeed distributed training framework

v-model｜ Youth training camp notes

Fudan University released the low-memory optimization technology LOMO | It reduces the memory usage of large model training to 10.8%, which is far ahead of DeepSpeed!

TensorFlow combined training data

"Machine Learning in Practice: Based on Scikit-Learn, Keras and TensorFlow Version 2" - Study Notes (4): Training Model

[Megatron-DeepSpeed] Tensor parallel tool code mpu detailed explanation (2): encapsulation mappings of Collective communication operation

Distributed parallel training (DP, DDP, DeepSpeed)

Recommended

Ranking

leetcode difficulty - wildcard matching (simple dp)

the input ios focus (), autofocus processing is invalid

Day 5-5 Binding method and non-binding method

Is only F5 in the browser to refresh the interface?

Spring-IOC XML configuration

ChatGPT is great, but don’t use it to write study abroad documents!

JAVA SE high-level language study notes -03.Java -05- abnormal and multithreading - the first two threads implementation

フロントエンドのパフォーマンスを最適化するためのいくつかの方法と戦略

Why does code static inspection need to operate on alarms?

PyTorch of topics for DataLoader

Daily

More

2025-05-01(0)

2025-04-30(0)

2025-04-29(0)

2025-04-28(0)

2025-04-27(0)

2025-04-26(0)

2025-04-25(0)

2025-04-24(0)

2025-04-23(0)

2025-04-22(0)