03 What is pre-training (Transformer prelude)


What is the use of pre-training

Machine Learning: Partial Mathematics ("Statistical Learning Methods" - Li Hang)

Deep learning (artificial intelligence) projects: big data support (mainstream)

Many of our projects do not have big data support (small data)

Cat and dog classification task: 100 pictures of cats and dogs--"I will give you a picture to distinguish whether it is a cat or a dog (a problem that cannot be solved, the accuracy is very low)

100,000 pictures of geese and ducks (known, someone has done it, and made a model A through these 10w pictures)

img

Some people found that the superficial general (horizontal and vertical strokes)

I trained a model A with 100 layers of CNN through 10w geese and ducks

Task B: 100 pictures of cats and dogs, classification--" training 100-layer CNN,impossible

Try to use the first 50 floors of A, use 100 floors to complete task B

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-y3D9VcIQ-1688809443180) (https://imgmd.oss-cn-shanghai.aliyuncs.com/BERT_IMG/%E9%A2%84%E8%AE%AD%E7%BB%83%E7%9A%84%E5%BA%9 4%E7%94%A8.jpg)]

  1. Freeze: Shallow parameters don't change
  2. Fine-tuning: Shallow layer parameters will change with task B training

What is pre-training

Through a trained model A, to complete a task B with a small amount of data (using the shallow parameters of model A)

Task A and Task B are very similar

How to use pre-training

fairseq, transformers library

Summarize

A task A, a task B, the two are very similar, task A has trained a model A, use the shallow parameters of model A to train task B, and get model B, 1.

Guess you like

Origin blog.csdn.net/linjie_830914/article/details/131614684