Natural language processing-load the pre-trained fastText model using facebook wiki-news-300d-1M.vec

fastText, which is different from predicting surrounding words in Word2vec, this algorithm predicts surrounding n characters. For example, "whisper" will generate the following two-character gram and 3-character gram:

wh, whi, hi, his, is, isp, sp, spe, pe, per, er

fastText trains a vector representation for each n-character gram, including words, misspelled words, word fragments, and even single characters. This method can handle rare words better than the original Word2vec.

# # .bin文件
# from gensim.models.fasttext import FastText
#
# ft_model = FastText.load_fasttext_format(model_file=MODEL_PATH)
# print(ft_model.most_similar('soccer'))

# .vec文件
from gensim.models import KeyedVectors

FASTTEXTFILE = "xxx\\wiki-news-300d-1M.vec"
ft_model = KeyedVectors.load_word2vec_format(FASTTEXTFILE)
print(ft_model.most_similar('soccer'))

Note: The function of fastText API provided by gensim is basically the same as that of Word2vec. The previous method also applies to the fastText model.

Guess you like

Origin blog.csdn.net/fgg1234567890/article/details/112974874