Rasa Course, Rasa Training, Rasa Interview, Subword Embeddings and Spelling of Rasa Practical Series

Rasa Course, Rasa Training, Rasa Interview, Subword Embeddings and Spelling of Rasa Practical Series

fixed text

FastText is an open source, free, lightweight library that allows users to learn text representations and text classifiers
insert image description here
word2vec and glove word embedding techniques cannot handle words outside the corpus. These embedding techniques treat words as minimal entities and try to learn their respective embedding vectors. Therefore, if a word does not appear in the corpus, word2vec or glove cannot obtain their vectorized representation.

How FastText is better:
fasttext follows the same skipgram and cbow models as word2vec, which treats each word as consisting of n-grams. That is, for the word "India", where the value of n is 3, we represent "<in", "ind", "ndi", "di>" as n-grams. For the word "India", we can infer the entire vector as the sum of the vector representations of all character n-grams. (It is assumed here that the hyperparameters [minn] and [maxn] have a value of 3, where 'minn' and 'maxn' are the smallest and largest ngrams, respectively). The symbols '<' and '>' are special symbols appended to show the start and end of a token. (Not the same as 'her')
Fasttext can generate word embeddings for words that do not appear in the training corpus

Guess you like

Origin blog.csdn.net/duan_zhihua/article/details/123565143