05 Neural network language model (one-hot encoding + origin of word vector)


statistical language model

Statistics + Language Model – "Use statistical methods to complete the following two tasks related to what people say

Language model = language (what people say) + model (to complete two tasks)

  1. Compare, "Parts of Speech", "Magnetism"
  2. Predict the next word (fill in the blank)

n-gram language model

Take a(2, 3, 4) words

Neural Network Language Model

Neural Network + Language Model – "Use the neural network method to complete the following two tasks related to what people say.

Second task:

"judgment", "a", "word", "of", " ___"

Suppose there are "part of speech" and "Mars" in the thesaurus

P( __|"judgment", "a", "word", "of")

part of speech

img

w1,w2,w3,w4 (one-hot encoding of the above 4 words)

w1*Q=c1,
w2*Q=c2,
w3*Q=c3,
w4*Q=c4,

C=[c1,c2,c3,c4]
Q就是一个随机矩阵,是一个参数(可学习)

"judgment", "this", "word", "of", "part of speech"

softmax(U[tanh(WC+b1)]+b2)== [0.1, 0.1, 0.2, 0.2, 0.4] ∈ [ 1 , V L ] \in[1,V_L] [1,VL]

One-hot encoding (one-hot encoding)

One Hot Encoding: Making Computers Know Words

img

Dictionary V (all the words in the Xinhua dictionary are combined into a set V)

Suppose there are only 8 words in the dictionary

computer does not understand words

But we want computers to recognize words

“fruit”

One-hot encoding: Given an 8*8 matrix

“time” --》 10000000

“fruit” --》 01000000

“banana” --》 00000001

Cosine similarity to calculate the similarity between the two (0) – word vector (matrix multiplication)

Word vectors (by-product Q of the neural network language model)

give me any word,

"Judgement" --" one-hot encoding w1 [1,0,0,0,0]

w1*Q =c1 (the word vector of the word "judgment")

Word vector: use a vector to represent a word

The dimension (size) of the word vector can be controlled

If we get word vectors, the first problem is also solved, (downstream task)

Summarize

Neural network language model: Solving the problem of two people speaking through neural networks

There is a by-product: Q matrix – "new word vector (word vector can choose the dimension of word vector, and can find the similarity between two words)

downstream tasks

Guess you like

Origin blog.csdn.net/linjie_830914/article/details/131614714