What is the use of pre-training
Machine Learning: Partial Mathematics ("Statistical Learning Methods" - Li Hang)
Deep learning (artificial intelligence) projects: big data support (mainstream)
Many of our projects do not have big data support (small data)
Cat and dog classification task: 100 pictures of cats and dogs--"I will give you a picture to distinguish whether it is a cat or a dog (a problem that cannot be solved, the accuracy is very low)
100,000 pictures of geese and ducks (known, someone has done it, and made a model A through these 10w pictures)
Some people found that the superficial general (horizontal and vertical strokes)
I trained a model A with 100 layers of CNN through 10w geese and ducks
Task B: 100 pictures of cats and dogs, classification--" training 100-layer CNN,impossible
Try to use the first 50 floors of A, use 100 floors to complete task B
[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-y3D9VcIQ-1688809443180) (https://imgmd.oss-cn-shanghai.aliyuncs.com/BERT_IMG/%E9%A2%84%E8%AE%AD%E7%BB%83%E7%9A%84%E5%BA%9 4%E7%94%A8.jpg)]
- Freeze: Shallow parameters don't change
- Fine-tuning: Shallow layer parameters will change with task B training
What is pre-training
Through a trained model A, to complete a task B with a small amount of data (using the shallow parameters of model A)
Task A and Task B are very similar
How to use pre-training
fairseq, transformers library
Summarize
A task A, a task B, the two are very similar, task A has trained a model A, use the shallow parameters of model A to train task B, and get model B, 1.