Large model, large data, loss will be reduced and accuracy will be increased
1 large model
1.1 Model's Aha Moment
Give a half-knowledgeable example
1.2 Model
chain of thought
Only when the model is large enough will it have a better effect
calibration
The confidence of the detection model for the answer
"u-shape" will appear
2. Big data
Grammar and world understanding require different amounts of data
2.1 Preprocessing of data
Repeat training data to avoid hard memorization of models
2.2 Fixed Computing Resources
Given a certain resource, the reasonable ratio of data volume and parameter volume is given.
Small model with big data beats big model with small data
[LLaMA] also refers to this idea
2.3 Model Adjustment
2.3.1 instruction-tuning
For finetune corresponding to the problem
2.3.2 Overall Architecture
pretrained -> finetune ->reinenforce learning
(1) The finetune effect of the small model will be better than that of the large model
(2) The small model reinenforce learning effect will be better than the large model
3. Jump out of "big model and big data"
3.1 knn lm
General :
as a classification problem
KNNLM:
(1) Find the target and source vectors
(2) Find the distance between the target and the source
Red box: use with conventional method (weighted)
3.1.1 Disadvantages
inference too long
3.2 RETRO
Avoid model memory by querying (such as the value of Π)