How to use word embeddings?

Author: Zen and the Art of Computer Programming

1 Introduction

Word embeddings are an important and fundamental technique in the field of natural language processing. Its purpose is to map the words or phrases in the text to a continuous vector space through machine learning, so that similar words are closely related in the vector space, even if these words have different semantics, they can be distinguished in a certain sense. Word embedding has a wide range of application scenarios, such as recommendation systems, search engines, information retrieval, image recognition, text classification, sentiment analysis, etc., and its consequences also have very important social value.

This article will elaborate on word embedding related knowledge, and implement a simple case practice in combination with existing tools.

2. Basic concepts and terminology

(1) The meaning of word embedding

Word embedding is a normally distributed high-dimensional dense vector space, where each element represents a feature of a word or phrase. The closer the distance between any two points in the vector space, the closer their meanings are; otherwise, the greater the difference in meanings. The purpose of word embedding is to establish a similarity relationship, so that the computer can better understand the patterns, structures and semantics in natural language.

(2) Generation method of word embedding

Word embedding mainly consists of two steps:

  • Training: According to the corpus, use the statistical probability model to model the co-occurrence matrix of vocabulary-context (word-context), find out the context representation (context embedding) of each word, and iteratively optimize the model parameters through the gradient descent method to obtain The final word embedding matrix (word embedding).
  • Use: Input a new word or sentence, first use the pre-trained word embedding model (such as GloVe model or Word2Vec model

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132255955