Application of Embedding Technology in Recommendation System

Embedding is talked about everywhere , so what exactly is Embedding technology?

1. What is Embedding?

Embedding is actually a method of "representing" an object with a low-dimensional dense numerical vector . The object here can be a word, a product, a movie, etc.
An item can be represented by a vector, which means that the Embedding vector can express certain characteristics of the corresponding object , so the distance between this vector and other item vectors reflects the similarity of these items. Furthermore, the distance vector between two vectors can even reflect the relationship between them.

1.1 Examples of word vectors

The popularity of Embedding method began with the research on word vector generation in the NLP field.

Insert picture description here
The above picture is an example in Google’s famous paper Word2vec. It uses the Word2vec model to map words into a high-dimensional space. On the left side of the picture, the vector from king to queen and the vector from man to woman, both in terms of direction and scale They are all very close. This shows that the operation between word Embedding vectors can reveal the gender relationship between words! For example, the word vector of the word woman can be obtained by the following operation:
Embedding(woman)=Embedding(man)+[Embedding(queen)-Embedding(king)]

Similarly, in the example on the right, the vectors from walking to walked and from swimming to swam are basically the same, which shows that the word vector reveals the tense relationship between words! This is the magic of Embedding technology.

In the word vector space, even when the vector of a word is completely unknown, the word vector of the word can be inferred only by the semantic relation plus the word vector operation. Embedding expresses objects from another space in this way, and at the same time reveals the potential relationships between objects.

Similarly, in the field of e-commerce, embedding products, then the vector distance of Embedding (keyboard) and Embedding (mouse) should be relatively close, and the distance between Embedding (keyboard) and Embedding (hat) will be relatively far.

1.2 The importance of Embedding technology for deep learning recommendation systems

  • A large number of one-hot encodings are used in recommendation scenarios to encode category and id features, resulting in extremely sparse sample feature vectors. The structural characteristics of deep learning make it unfavorable for the processing of sparse feature vectors. Therefore, almost all deep learning recommendation models will be The Embedding layer is responsible for converting high-dimensional sparse feature vectors into dense low-dimensional feature vectors.
  • Embedding itself is an extremely important feature vector. Compared with the feature vector generated by traditional methods such as MF, Embedding has stronger expressive ability.
  • Embedding's calculation of similarity between items and users is a commonly used recommendation system recall technique.
    Especially after fast nearest neighbor search technologies such as local sensitive hashing are applied to the recommendation system, Embedding is more suitable for rapid "screening" of massive candidate items.

2.word2vec-the classic Embedding method

2.1 What is word2vec

Word2vec is the abbreviation of "word to vector". As the name suggests, it is a model for generating vector expressions of "words".

To train the Word2vec model, we need to prepare a corpus consisting of a set of sentences. Assume that one of the sentences of length T contains words   w 1, w 2…… wt \ w_1,w_2……w_t w1,w2wt, And we assume that each word has the closest relationship with its neighbors.
CBOW and Skip-gram
According to different model assumptions, the Word2vec model is divided into two forms, the CBOW model and the Skip-gram model. Among them, the CBOW model assumes that the selection of each word in the sentence is determined by adjacent words, so the input of the CBOW model is wt w_twtSurrounding words, the predicted output is wt w_twt. The Skip-gram model is just the opposite. It assumes that each word in the sentence determines the selection of adjacent words, so you can see that the input of the Skip-gram model is wt w_twt, The predicted output is wt w_twtSurrounding words. According to general experience, the effect of Skip-gram model will be better.

2.2 The training process of Word2vec

2.3 Word2vec's negative sampling training method

3. Item2vec-the promotion of Word2vec in the field of recommendation systems

After the birth of Word2vec, Embedding's ideas quickly spread from the field of natural language processing to almost all fields of machine learning, and recommendation systems are no exception. Word2vec can embedding the words in the word "sequence", so for a user to purchase a product in the "sequence" and the user watches a movie in the "sequence", there should also be a corresponding Embedding method.

Insert picture description here

Reference: Deep Learning Recommendation System, edited by Wang Zhe

Guess you like

Origin blog.csdn.net/weixin_44127327/article/details/112602399