[Natural Language Processing] Talk about how Exposure Bias is resolved

Preface

Exposure bias is simply due to the inconsistency of text generation during training and inference. The inconsistency is reflected in the different inputs used during inference and training. During training, each word input comes from a real sample (GroudTruth), but during inference, the current input uses the output of the previous word.

solution

1. Using scheduled-sampling, the simple method is to select the real sample with the probability of p for the input used in the training phase, and select the output of the previous word with the probability of 1-p. And this probability p decays as the number of training increases. You can use the decay of an exponential function, or an inverse sigmoid function or a linear function.

Probability changes with function

2. Add a certain constraint to the loss (I have seen someone do this before, but when I want to check it carefully later, I found that I have not been able to find this article).
3. Reinforcement learning +GAN is the method of SeqGAN. This Exposure Bias was also seen from the article "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient", but the author explained this problem (the author explained that using the maximum likelihood estimation method will cause this problem), but SeqGAN Did not explain how to solve this problem. After careful consideration, we can find that giving each word in the SeqGAN training phase is a corresponding reward instead of increasing the probability of the word like a maximum likelihood estimation, so it will not lead to this Exposure Bias. The task of training
related materials
in BERT is to predict the words of Mark (covered) (these words are represented by a mark), but the downstream tasks do not have this mark, so a similar approach is adopted in BERT, that is, the sentence has p during training. Probability is a word with a cover, a probability with q is a replacement word, and a probability with a 1-pq is not covered. However, this approach is very similar (or borrowed from) with scheduled-sampling.
 

 

 

 

Guess you like

Origin blog.csdn.net/devil_son1234/article/details/114818969