The difference between using and not using pre-trained word vectors

Pretrained word vectors

For situations where there is less training data, using pre-trained word vectors can embed some currently interpretable or uninterpretable prior word information into the word vector, and this prior knowledge is useful for downstream word vector training tasks, especially Very helpful for small data sets. The selection of pre-trained word vectors mainly considers two factors: corpus and dimension.

  1. The corpus selection needs to be consistent with the text type of the training data. For example, English corresponds to the English pre-training set, Chinese corresponds to the Chinese pre-training set, and news text preferably corresponds to the pre-training set of news text.
  2. The dimensions of the pre-trained word vectors must be consistent with the dimensions of the custom word vectors.

(First, the purpose

Through the correlation features (contextual language structure) between words and words that have been trained, it can be applied to similar contextual features to make up for the insufficient training data to learn the general characteristics of the language structure.

(2) The difference between using and not using pre-trained word vectors

  • Using pre-trained word vectors will represent semantic information with the relationship between the pre-trained words;
  • Randomly generated initialized word vectors cannot predict the target word through the occurrence of specific context words;
  • In other words, if there is a context word in the pre-training set in the training data, an accurate target can be generated after subsequent neural network training. There is no need to backpropagate and update the incoming weight value. Otherwise, the weight needs to be constantly updated. Parameters to implement gradient descent to the lowest point to find the optimal value. (In fact, pre-training word vectors can simplify the process of gradient descent <model convergence>, personal understanding)

Guess you like

Origin blog.csdn.net/weixin_53952878/article/details/128009314