Understanding GloVe model (Global vectors for word representation)

Understand GloVe model

Outline

Model Objectives: to quantify the word indicates that as much as possible contain information between semantics and syntax of the vector.
Input: Corpus
output: word vector
methods Overview: First, build a corpus based on word co-occurrence matrix, and then based on the co-occurrence matrix and vector GloVe word learning model.
Start
statistical co-occurrence matrix
training vector word
end
statistical co-occurrence matrix

Provided co-occurrence matrix XX, whose elements Xi, jXi, j.
Meaning Xi, jXi, j is: the entire corpus, the word and the word ii jj common frequency appears in a window.
For chestnut:
with a corpus:

i love you but you love him i am sad
that only a small corpus of sentences, involving seven words: i, love, you, but , him, am, sad.
If we use a window width of 5 (both left and right length of 2) Statistics window, the window will have the following contents:

Window numeral headword window contents
0 II Love you
. 1 Love I Love you But
2 you I Love you But you
. 3 But Love you But you Love
. 4 you you But you Love HIM
. 5 Love But you Love HIM I
. 6 HIM you Love HIM I AM
. 7 I Love HIM I AM SAD
. 8 AM HIM I AM SAD
. 9 SAD I AM SAD
window length of less than 0,1 5 is left because the content less than 2 headwords, 8,9 empathy window length is less than 5.
5 an example of how to structure a window co-occurrence matrix:
headword is love, the context of the word but, you, him, i; executes:
Xlove, + = But. 1
Xlove, + = But. 1

Xlove,you+=1
Xlove,you+=1

Xlove,him+=1
Xlove,him+=1

Xlove,i+=1
Xlove,i+=1

Use window will again traverse the entire corpus can be obtained by co-occurrence matrix XX.
Use GloVe model training vector word

Model formula

Look at the model, cost function length like this:
J = [sigma] l, JNF (Xi, J) (BJ-vTivj + BI + log (Xi, J)) 2
J = [sigma] l, JNF (Xi, J) (+ viTvj bi + bj-log (Xi, j)) 2

vivi, vjvj is the word and the word vector word ii jj's, bibi, bjbj are two scalar (defined by the author bias term), ff is the weighting function (a function of the specific formulas and functions in the next section describes), NN is the size of the vocabulary (co-occurrence matrix of dimension N * NN * N).
We can see, there is no method GloVe model using neural networks.
Model of how to

So why such a structural model of it? First define a few symbols:
Xi = Σj = 1NXi, J
Xi = Σj = 1NXi, J

Matrix fact that the word line and ii;
Pi, Xi = K, kXi
Pi, Xi = K, kXi

Conditional probability represents the probability that the word appears in the word kk ii context;
ratioi, J, K = Pi, KPJ, K
ratioi, J, K = Pi, KPJ, K

The ratio of two conditional probability.
The author's inspiration is this:
The authors found that, ratioi, j, kratioi, j , k this indicator is regular, regular statistics in the table below:
Value word j ratioi, j, kratioi, j , k 's, k word j, k Related words j, k word j, k unrelated
word i, k word i, k-related approaches 1 great
word i, k word i, k irrelevant little closer to a
very simple rule, but it is useful.
Idea: Suppose we've got word vector, vector words if we vivi, vjvj, vkvk computed by some function ratioi, j, kratioi, j, k, can also be given such a law, then it means that we co-word vector matrix now has a good consistency, it means that our word vector contains the information co-occurrence matrix implied.
Word vector provided vivi, vjvj, vkvk calculated ratioi, j, kratioi, j, k of the function g (vi, vj, vk) g (vi, vj, vk) ( we never mind to form the specific function), then there should be:
Pi, KPJ, ratioi = K, J, K = G (VI, VJ, VK)
Pi, KPJ, ratioi = K, J, K = G (VI, VJ, VK)

Ji:
the Pi, kPj, for k = in the g (in the vi, VJ, vk You)
the Pi, kPj, for k = in the g (in the vi, VJ, vk You)

I.e., both should be as close as possible;
readily occur to both differential side as the cost function:
J = [sigma] l, J, kN (Pi, KPJ, K-G (VI, VJ, VK)) 2
J = Σi, j, kN (Pi, kPj, k-g (vi, vj, vk)) 2

But a closer look, the model contains three words, which means to be calculated on the complexity of N * N * NN * N * N , and too complicated, the best longer a simple point.
Now let's think carefully about g (vi, vj, vk) g (vi, vj, vk), perhaps it could help;
the author's brain hole is as follows:
1. To consider the relationship between words and word ii jj that g (vi, vj, vk) g (vi, vj, vk) are probably to have such a right: vi-vjvi-vj; ah, rational, linear space expedition similarity of two vectors, not loss linearly trip, vi-vjvi-vj is probably a reasonable choice;
2. ratioi, J, kratioi, J, K is a scalar, then g (vi, vj, vk) g (vi, vj, vk) Lastly, it should is a scalar ah, although its inputs are vectors, inner product that should be the logical choice, so there should be such a right: (vi-vj) Tvk ( vi-vj) Tvk.
3. Then OF went (vi-vj) outside Tvk (vi-vj) Tvk jacket layer exponentiation exp () exp (), to give the final g (vi, vj, vk) = exp ((vi- vj) Tvk) g (vi, vj, vk) = exp ((vi-vj) Tvk);
the most critical step 3, why set a layer of exp () exp ()?
After put, our goal is to make possible the establishment of the following formula:
Pi, KPJ, k = G (vi, VJ, VK)
Pi, KPJ, k = G (vi, VJ, VK)

Ji:
Pi, kPj, k = exp ((The vi-vj) Tvk)
Pi, kPj, k = exp ((The vi-vj) Tvk)

即:
Pi, kph, k = exp (vTivk-vTjvk)
Pi, kph, k = exp (viTvk-vjTvk)

即:
Pi,kPj,k=exp(vTivk)exp(vTjvk)
Pi,kPj,k=exp(viTvk)exp(vjTvk)

Then the simplified method discovered found: just let the molecules corresponding to formula are equal, corresponding to the denominator are equal, namely:
Pi, K = exp (vTivk) and Pj, K = exp (vTjvk)
Pi, K = exp (viTvk) and Pj , k = exp (vjTvk)

However, the same form of the numerator and denominator, the two can be considered the unity, namely:
Pi, J = exp (vTivj)
Pi, J = exp (viTvj)

Originally, we seek:
Pi, KPJ, k = G (vi, VJ, VK)
Pi, KPJ, k = G (vi, VJ, VK)

Now only need to pursue:
Pi, J = exp (vTivj)
Pi, J = exp (viTvj)

Take logarithmic sides:
log (Pi, J) = vTivj
log (Pi, J) = viTvj

Then the cost function can be simplified to:
J = [sigma] l, jN (log (Pi, J) -vTivj) 2
J = [sigma] l, jN (log (Pi, J) -viTvj) 2

Now only needs to be performed on the complexity of N * NN * N calculations, rather than N * N * NN * N * N, now about why in step 3, the outer jacket layer exp () exp () to clear, It is because the cover layer exp () exp (), it becomes a form such that the difference in the form of supplier, and thus on both sides of the equation corresponding to the numerator and denominator are equal, to simplify the model.
However, something went wrong.
Closer look at this two formulas:
log (Pi, J) = vTivj and log (Pj of, I) = vTjvi
log (Pi, J) = viTvj and log (Pj, i) = vjTvi

log (Pi, j) log ( Pi, j) is not equal to log (Pj, i) log ( Pj, i) is equal to but vTivjviTvj vTjvivjTvi; i.e., the left side of the equation does not have symmetry, but the right symmetry.
A problem mathematically.
Remedy look good.
Now the cost function conditional probability of expansion:
log (Pi, J) = vTivj
log (Pi, J) = viTvj

Namely:
log (Xi, J) -log (Xi) = vTivj
log (Xi, J) -log (Xi) = viTvj

Which becomes:
log (Xi, J) = BI + BJ vTivj +
log (Xi, J) = BI + + BJ viTvj

I.e. add a bias term bjbj, and log (Xi) log (Xi) absorbed in bibi bias term.
So the cost function becomes:
J = [sigma] l, jN (vTivj + + BI BJ-log (Xi, J)) 2
J = [sigma] l, jN (viTvj + + BI BJ-log (Xi, J)) 2

Then based on the principle of the higher frequency words children weights would appear larger, add weighting terms in the cost function, then to further improve the cost function:
J = [sigma] l, JNF (Xi, J) (BJ-vTivj + BI + log (Xi, J)) 2
J = [sigma] l, JNF (Xi, J) (BJ-viTvj + BI + log (Xi, J)) 2

Specific weighting function should be kind of how it?
It should first non-decreasing, and secondly when the word frequency is too high, the weight should not be excessively increased, the weight of experimentally determining the function:
F (X) = {(X / Xmax) 0.75,1, IF X <xmaxif X> = Xmax
F (X) = {(X / Xmax) 0.75, IF X <xmax1, IF X> Xmax =

This whole model is introduced over.
Explanation

If an error, please correct me.
---------------------
Author: dumplings vinegar
Source: CSDN
Original: https: //blog.csdn.net/coderTC/article/details/73864097
Copyright: This article is a blogger original article, reproduced, please attach Bowen link!

Guess you like

Origin www.cnblogs.com/jfdwd/p/11083624.html