fastText principle of word vector [NLP]

1. The difference fastText and word2vec

  • similarities:
  1. FIG like model structure, are characterized by the embedding of a vector, expression vector obtained hidden word.
  2. We are used many similar optimization methods, such as using Hierarchical softmax scoring speed optimization and prediction of training.
  • Differences:
  1. Output layer model: word2vec the output layer, corresponding to each term, a term of maximum probability calculation; fasttext of the output layer corresponds to the classification label. But regardless of the output layer corresponds to what, from the corresponding vector will not be retained and used.
  2. Input layer model: word2vec the output layer, it is within the term context window; fasttext the entire sentence corresponding to the contents, including term, but also the content of the n-gram.
  • Two different nature, resides in the use of h-softmax:
  1. Word2vec The object is to obtain a vector word, the word is obtained in the final vector input layer, the output layer corresponding to the h-softmax
    will generate a series of vectors, but eventually discarded, not used.
  2. fastText full use of the classification function h-softmax all leaf nodes to traverse a classification tree, find the largest label probability (one or the N)

2. summary

fastText can be a shallow depth of network access and network comparable accuracy and fast classification algorithm. According to the authors of the statement "on standard multi-core CPU, able to train word vector 1 billion word corpus level within 10 minutes, can be classified has more than 300,000 categories sentence of more than 50 million in less than one minute." But it also has its own conditions of use, it is particularly suitable for multi-class classification problem, if the category is relatively small, easy to over-fitting.

 

Reprinted from: https://www.cnblogs.com/huangyc/p/9768872.html

Guess you like

Origin blog.csdn.net/zkq_1986/article/details/93201706