LSTM understand network --Understanding LSTM Networks (translation of an colah's blog)

colah explain LSTM of a good article, translates to study together, the original address: http://colah.github.io/posts/2015-08-Understanding-LSTMs/  , ON. Posted August 27, 2015.

Recurrent Neural  Networks

Human thinking has coherence. When you look at this article, based on your understanding of the previous words, you can understand the meaning of words in the current. I.e. is preceded by a subsequent content can help understand the content, it reflects the continuity of thinking.

Traditional neural network (network before RNN) in solving the task, as the human mind can not do the same continuity, which became one of its main problems. For example, you want to film the story take place at different times of the classification of events, you can not use the traditional neural network, a movie based on the story in front of the moment happens to follow the reasoning of the plot. RNN can solve such problems associated with continuity, they have a network loop structure, in which information can be persisted.

RNN has a cyclic structure

Figure above neural network structure A, receives an input X- , the output H T  . A having a cyclic structure such that information may be passed to the next time step (t + 1) from a time step (timestep, t). A loop network with arrows seem complex, after the loop structure will be deployed relatively clear, from the deployed configuration of FIG circulation network may see the difference with ordinary network is not as great. Recurrent Neural Networks RNN can be regarded as a network with several copy only, and then copy this network in the order of several lined up, so that just before a network output as input to the next network.

 

RNN expanded schematic

 Such properties show similar chain sequence data and RNN natural linked list data, this arrangement is the use of neural network data sequence native structure. This is true. Over the past few years, RNN successfully applied to a variety of problems: speech recognition, language model (machine) translation, add a caption to the image, and so on. Andrej Karpathy's blog discusses various magical effect of RNN: at The Unreasonable Effectiveness of Recurrent Neural Networks . RNN thanks to the successful use of a special kind of recurrent neural network architecture: LSTM. In the field of the use of recurrent neural networks, almost all of the exciting results are based on LSTM made out of it than standard RNN structure is much better. This article next step is to discuss these LSTM structure.

Long-term dependence problem ( at The Long-Term Problem of the Dependencies )

 The idea of ​​a RNN interesting is that they may be able to current problems linked to previous information, such as the use of some of the contents of the previous frame of the movie may be able to help understand the movie behind the content. RNN information can solve this long-term dependence problem, depending on the specific circumstances.

Sometimes we just need information on the latest information to solve the current problems, such as a language-based model to predict a few words in front of a word, we want to predict the final word sentence "at The Clouds are in at The Sky  ", alone before word of the sky, no other additional information to predict the sky , in this case the predicted position and distance-related information close, RNN can learn to use such information before.

 

But there is another case of some of the RNN can not handle. For example, to predict a large segment, then the last word "the I Grew up in France .... the I Speak FLUENT  French .". More recent information suggesting that the last word might be a language, but if you want to determine is the  French  would have required far France information in this context. This case, France and French distance may be more far, RNN can not learn to associate the information.

 

 RNN is theoretically possible to deal with this "long-term dependency" problem, one can manually carefully selected parameter RNN to solve small problems, but in practice RNN still can not learn "to rely on" solution,  Hochreiter (1991) [ german]  and  Bengio, et al. (1994)  studied in depth some of the main reasons RNN not work. LSTM no such problems.

 

 LSTM

 LSTM (Long Short Term Memory) is a special network RNN network, can solve the "long-term dependence" (long-tern dependencies) problem by Hochreiter & Schmidhuber (1997) proposed that performance results in many problems are well, now it is also widely used.

LSTM is explicitly designed to solve the "long-term dependence" problem, it is possible to remember the long-term information is an attribute of their default.

 

RNN in the standard configuration, the circulating module network structure is relatively simple, such as only one layer tanh.


RNN circulating only a simple layer of network modules

 

LSTM也有与RNN相似的循环结构,但是循环模块中不再是简单的网络,而是比较复杂的网络单元。LSTM的循环模块主要有4个单元,以比较复杂的方式进行连接。


LSTM的循环模块中包含4个网络层

 

先提前熟悉一下将要用到的标记,下面会一步一步的讲解LSTM。

 

LSTM的主要思想(The Core Idea Behind LSTMs)

每个LSTM的重复结构称之为一个细胞(cell),在LSTM中最关键的就是细胞的状态,下图中贯穿的那条横线所表示的就是细胞状态。

 

LSTM能够给细胞状态增加或者删除信息,是由一种叫做“门”的结构来控制的,门主要起到开关的作用,它可以选择性的让信息通过,门是由一个sigmoid层与一个点乘操作组成的。

门的组成结构

sigmoid层输出的是0-1之间的数字,表示着每个成分能够通过门的比例,对应位数字为0表示不通过,数字1表示全通过。比如一个信息表示为向量[1, 2, 3, 4],sigmoid层的输出为[0.3, 0.5, 0.2, 0.4],那么信息通过此门后执行点乘操作,结果为[1, 2, 3, 4] .* [0.3, 0.5, 0.2, 0.4] = [0.3, 1.0, 0.6, 1.6]。

LSTM共有3种门,通过这3种门控制与保护细胞状态。

 

逐步详解LSTM(Step-by-Step LSTM Walk Through)

 第一步是LSTM决定哪些信息需要从细胞状态中丢弃,由“遗忘门”(sigmoid层)来决定丢弃哪些信息。遗忘门接收上一时刻的输出ht-1 与这一时刻的输入xt,然后输出遗忘矩阵f,决定上一时刻的细胞状态Ct-1 的信息通过情况。

遗忘门

 

 

 第二步是决定从新的信息中存储哪些信息到细胞状态中去。首先输入门(也是sigmoid层)接收ht-1 xt,产生决定我们要更新哪些数字。接下来一个tanh层产生候选状态值 \tilde{C}_t 。再联合待更新的数值与候选状态值,对旧的状态Ct-1 进行更新。如下图,

接着是更新旧的状态,如下图,ft 是遗忘矩阵, Ct-1是上一时刻的状态(旧状态),it 是决定要更新哪些数字,~C是候选状态。

 

最后一步是决定输出哪些信息。首先利用输出门(sigmoid层)产生一个输出矩阵Ot,决定输出当前状态Ct的哪些部分。接着状态Ct通过tanh层之后与Ot相乘,成为输出的内容ht 。如下图。

 

LSTM的变种

目前讨论的都是普通的LSTM,并非所有的LSTM网络结构都是这样的。

Gers & Schmidhuber (2000) 提出了一个比较流行的LSTM模型,加入了“peephole connections”,也就是增加了细胞状态与3个控制门的直接连接。如下图,

 

另一个LSTM变种模型是利用耦合的遗忘门与输入门。遗忘门决定哪些状态需要遗忘,输入门决定哪些输入信息加入新的状态,将这个两个分离的过程联合起来,共同做决定。只在要输入新状态的部分对旧的状态进行丢弃,或者只在被遗忘的状态部分输入新的状态。见下图,

 

另一个更加动态的LST是由 Cho, et al. (2014) 提出的GRU(Gated Recurrent Unit),GRU将遗忘门与输入门整合成一个单独的更新门,还整合了细胞状态与隐藏状态等,结果是GRU比标准的LSTM模型更加的简单。

 

还有其他的一些变种如Yao, et al. (2015) 提出的Depth Gated RNNs。还有采用其他方法解决长期依赖问题的方案如 Koutnik, et al. (2014) 提出的 Clockwork RNNs。

Greff, et al. (2015) 对比了这些变种发现它们都是一样的,其中的差别并没有太多的影响。 Jozefowicz, et al. (2015) 测试了测试了成千上万的RNN结果,发现有些RNN在某些特定任务上表现比LSTM要好。

 

结论

LSTM在绝大多数任务上的表现确实比RNN要好。LSTM是一个很大的进步,但是另外一个大的进步是什么呢?研究人员的一个共识就是注意力机制(Attention),注意力机制就是让RNN在每一步的训练都从大的信息集合中挑选需要的信息,类似于注意力集中在某一部分的信息上面。使用注意力机制有很多令人兴奋的成果出来,还有更多的成果即将出现。

注意力机制不是RNN领域唯一的,还有Kalchbrenner, et al. (2015) 的Grid LSTMs,还有 Gregor, et al. (2015)Chung, et al. (2015), or Bayer & Osendorfer (2015) 等人在生成式模型方面利用RNN。

致谢

省略...

Guess you like

Origin www.cnblogs.com/banluxinshou/p/11656381.html