statistical language model
Statistics + Language Model – "Use statistical methods to complete the following two tasks related to what people say
Language model = language (what people say) + model (to complete two tasks)
- Compare, "Parts of Speech", "Magnetism"
- Predict the next word (fill in the blank)
n-gram language model
Take a(2, 3, 4) words
Neural Network Language Model
Neural Network + Language Model – "Use the neural network method to complete the following two tasks related to what people say.
Second task:
"judgment", "a", "word", "of", " ___
"
Suppose there are "part of speech" and "Mars" in the thesaurus
P( __
|"judgment", "a", "word", "of")
part of speech
w1,w2,w3,w4 (one-hot encoding of the above 4 words)
w1*Q=c1,
w2*Q=c2,
w3*Q=c3,
w4*Q=c4,
C=[c1,c2,c3,c4]
Q就是一个随机矩阵,是一个参数(可学习)
"judgment", "this", "word", "of", "part of speech"
softmax(U[tanh(WC+b1)]+b2)== [0.1, 0.1, 0.2, 0.2, 0.4] ∈ [ 1 , V L ] \in[1,V_L] ∈[1,VL]
One-hot encoding (one-hot encoding)
One Hot Encoding: Making Computers Know Words
Dictionary V (all the words in the Xinhua dictionary are combined into a set V)
Suppose there are only 8 words in the dictionary
computer does not understand words
But we want computers to recognize words
“fruit”
One-hot encoding: Given an 8*8 matrix
“time” --》 10000000
“fruit” --》 01000000
“banana” --》 00000001
Cosine similarity to calculate the similarity between the two (0) – word vector (matrix multiplication)
Word vectors (by-product Q of the neural network language model)
give me any word,
"Judgement" --" one-hot encoding w1 [1,0,0,0,0]
w1*Q =c1 (the word vector of the word "judgment")
Word vector: use a vector to represent a word
The dimension (size) of the word vector can be controlled
If we get word vectors, the first problem is also solved, (downstream task)
Summarize
Neural network language model: Solving the problem of two people speaking through neural networks
There is a by-product: Q matrix – "new word vector (word vector can choose the dimension of word vector, and can find the similarity between two words)
downstream tasks