CHANG machine study notes -14: Unsupervised Learning: Word Embedding

Word Embedding: dimension reduction way in the text

We want the machine to do is: after reading machine can use a lot of words to express a Vcetor vocabulary, and V represents a dimension in a sense, or similar features similar meaning or semantics of the word vector can be presented .

  1. 1-of-N Encoding: one of the most simple words described as a vector approach is 1-of-N Encoding, if only five words, we can use the 5-dimensional vector to represent these five words, but this method We can not describe the relationship between the vectors
  2. Word Class: vocabulary classified on this basis (such as: animals, plants ...)
  3. Word Embedding: each word with a plurality of dimensions described

Here Insert Picture Description
What is the relationship between you probably Marco and Cai brother, but he has back faces of the children's words, so after reading their V machines have the same place. Here Insert Picture Description
There are two ways depending on context estimation: one is the Count based, the other is Prediction based

  • Count based
    if two words are similar, then the two words vector (wi, wj) is relatively similar, if wi.wj number of their co-occurrence nij positive correlation, then they are relatively close.
    Here Insert Picture Description
  • Prediction based
    Prediction based operation is: enter some words through a NN, output specifies the next vocabulary
    Here Insert Picture Description
    Prediction based access can be used if the push message

Here Insert Picture Description
也可以用在Language Modeling(预测一句话出现的概率)
然而一句话出现在NN中是不会出现的,如果我们要预测wreck a nice beach出现的概率,我们可以计算wreck出现的概率,乘上wreck后面接a的概率,以此类推。
Here Insert Picture Description
Here Insert Picture Description
具体怎么做:
例如我们将词(one-hot编码后)输入网络,网络输出是每个词出现的几率,当我们把模型训练好后,把隐藏层第一层作为这个词汇的Word Vector,做Word Embedding用的是一层线性的神经网络,因为这个部分可能是其他DNN的前面一部分,并且Word Embedding的Vevtor可能数据很大,且一层隐藏层就可以达到效果
Here Insert Picture Description
同样是这个例子,两个人后面有相同的部分,所以在某些维度上他们是相同的。

Here Insert Picture Description
共享参数
如果一个词汇(10万维)就一个参数,两个词汇就得到20万维的数据,如果另w1=w2就是的数据不会增加(10万维),w就像CNN的filtter

Here Insert Picture Description
Here Insert Picture Description
There are many prediction-based methods, one is CBOW model, we know wi-1, wi + 1 Wi to predict,
the other is the Skip-gram model, we know Wi, to predict wi-1, wi + 1
Here Insert Picture Description
we use the word embedding will find the relationship between some words, such as the relationship between states and capitals composition, but also the relationship such as verb tenses and which Here Insert Picture Description
, if we subtract two word vector (which is a subclass of another ), which is 2-dimensional planar projection will be relatively close, so there follows hotter-hot≈bigger-big, for example, we say that Rome in Italy just as in Berlin? (Germany), we will go to Germany vector computing - Rome, Italy vector + vector to obtain a vector to find the closest, is probably Germany Here Insert Picture Description
we can not only do the words classification document classification can be done, for example, we will pass a document bag -of-word (the same occurs several times, the dimension of that word is a few) collection of words
Here Insert Picture Description
we do so often enough, such as two statements following figure bag of word exactly the same, but they are not the same as the order of words leads to the expression of meaning is not the sameHere Insert Picture Description

Published 16 original articles · won praise 0 · Views 946

Guess you like

Origin blog.csdn.net/qq_44157281/article/details/98735741