Article directory
Large model status
baseline base selection
data structure
- Domain data
- Book data
- Website data
- news content
- command trim data
Mixed data (public data + domain data, ratio 1:5) avoids knowledge forgetting, leading to a decline in general capabilities.
Migration method
- When resources are insufficient, train based on the chat model
When resources are sufficient, train tens of millions of data on the Base model . Do not use the full amount of data for training on the Chat model.
evaluate
think
Domain large model training techniques
- ChatGPTBook:github.com/liucongg/ChatGPTBook
Tokenizer
Distributed Deep Learning
data parallelism
Pipeline parallelism
vector parallelism
Distributed framework——Megatron-LM
Distributed deep learning framework—Colossal-AI
Distributed deep learning framework—DeepSpeed
P-tuning fine-tuning
LF