The development of deep learning in the field of NLP (a)

Deep Learning development in the field of NLP

    three phases:

  • Word Embedding

    • Word2Vec

    • GloVe

  • RNN improve and expand

    • LSTM / GRU

    • Seq2Seq

    • Attention/Self-Attention

  • Contextual Word Embedding

    • ELMO

    • OpenAI GPT

    • BERT

The first stage is mainly Word Embedding include word2vec and glove, you can learn by unsupervised expect a better word vector can put a word vector for downstream tasks to improve the generalization ability of the model.

The second stage is mainly RNN improvement and expansion, including improvements to the original GRU LSTM and answers, including two RNN composed Seq2Seq can solve most of the problems nlp inside, such as machine translation, quiz, abstract. Attention / Self-Attention is a further improvement of the Seq2Seq.

The third stage is to consider the context of the word embedding.

The language model (nlp basic tasks)

Language modeling is to predict the next word in the text of the given word before a task to predict the probability of a sentence.

Task: Given a sentence w, which includes k words, w1 to wk. Need to calculate the probability of the sentence, where the use of the conditional probability of his broken down into product form of conditional probability: for example, first calculate the probability of w1, w2 and then calculate the probability in the case of a given w1, w1 and w2 then given to w3 of the probability calculation, and the probability of a given wk-1 w1 to wk is calculated. Prior to the depth of learning the mainstream language model is n-gram model. But this model is the lack of generalization.

word embedding

It is the word mapped to a dense space low latitudes, so that a mapping of two semantically more similar word semantic space relatively close distance, large semantic differences further away.

word2vec you can learn the relationship between words.

Assume word2vec using distributed assume that it means is that if the context of the two words are similar, their semantics are also similar.

  • There are two basic models word2vec: CBOW and Skip-Gram. CBOW is predicted using context headword, Skip-Gram headword is predicted using context

 

  • 词袋(Bag of Words)模型,最常用的方法是one-hot编码和tf-IDF。

词袋将文本看作一个无序的词汇集合,忽略语法和单词顺序,对每一个单词进行统计,计算词频。 但是问题是BOW的最大缺点是严重缺乏相似词之间的表达。

对于nlp来说,我们输入的是一个一个离散的符号,比如说是一些是字符的序列,或者说是一些单词的序列。对于神经网络来说,他处理的都是向量或者矩阵。所以我们第一步就是需要把一个词编码成一个向量。最简单的编码方式就是One-Hot Encoding。但是这种编码产生的结果是高维的稀疏的,这是缺陷。

词袋模型和word2vec都是将文本向量化的常用方法。

在word embedding编码的时候,一个词经常会有多个语义,如果强行将用一个向量来编码它的语义的时候,我们只能把这几种语义都编码到向量里面,但实际上句子里面的词一般只有一种语义。

像上述的情况可以用RNN、LSTM、GRU来解决。

用两个RNN组成Seq2Seq的模型。

由于词向量是不考虑上下文的,比如在编码bank的时候,只能把“银行”“水边”两个意思编码进去。

RNN可以解决这个问题,因为RNN有记忆能力,比如上图中在计算t时刻的一个隐状态时,除了考虑Xt(输入),同时也会考虑t-1时刻的隐状态,会把这两个组合起来才可以求出St(t时刻的隐状态)。同时在计算t+1时刻时,除了考虑t+1时刻的输入Xt+1,还会考虑t时刻的隐状态。

St是记忆了从开始到t时刻的所有语义,加入最后一刻是Sn,则可以认为是编码了整个句子的语义。所以RNN是具有记忆能力,同时具有上下文表示的能力。

最原始的RNN,神经网络大部分梯度下降或者反向传播来训练参数。我们需要把loss或者error 向前传递。由于梯度消失的问题,普通的RNN很难学到长距离的依赖关系。此时LSTM可以来解决这个问题。

LSTM通过门的机制来避免了梯度消失的问题。

 

 

 

 

发布了34 篇原创文章 · 获赞 45 · 访问量 2万+

Guess you like

Origin blog.csdn.net/one_super_dreamer/article/details/104579271