Other popular network

SE-Net

Attentional mechanisms in the channel direction
Here Insert Picture Description
Here Insert Picture Description

  • Process
    First, image convolution operation, then global average pooling (squeeze) for the feature map, turn it into a vector 1x1xC then through two layers of FC (excitation), multiplied by its initial feature map, as a next input of.
  • Why FC plus two
    plus two FC reason is that, if he only did pooling on the feature map, only the current image of the scale made a deal, but the true scale is the scale of the entire data set, so to add FC, let him adapt to the entire data set.
  • How the compression parameter
    of a fully connected compressed into the C-channel C / r channels to reduce the amount of computation (followed by the RELU), connected to a second full channel C and then revert back (followed by the Sigmoid), r It refers to the ratio of compression.
  • Depth of interaction
    in the early layer, which in a manner independent of the type of excitation Feature, thereby enhancing the shared underlying representation. Later deeper layers gradually become invalid.

FCN

  • The convolution
    heatmap H'xW into the fully connected is the convolution layer, output 'of the last layer of the
    pixel-level classification
  • Convolution
    restore feature size to the original picture size for split
  • Hopping skip layers
    because after the last layer of pooling lost a lot of information, the feature map layers before and after the take over of pooling supplement

RNN Recurrent Neural Network

Forward propagation

Here Insert Picture Description
Here Insert Picture Description

BPTT bp through time

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

RNN short-term memory problems

Return Path × w when the matrix has been
an error in the backward transfer obtained, he himself is multiplied by a parameter W * activated derivative of the function at each step. If W is a number less than 1, such as 0.9. this 0.9 multiplied by the continuous error, the error will be passed to the initial point of time is a number close to zero, so for the initial time, equivalent to the error disappeared. we put this question is called the gradient disappears or gradient diffusion gradient vanishing. Conversely, if W is a number greater than 1, such as 1.1 and constantly tired multiply, then in the end become infinite number RNN Chengsi this infinite number, in which case we called the gradient of the explosion, which is common RNN is no way to recall old the cause of memory.

How to solve gradient explosion

  • Reasonable weight initialization value. Initializing the weights, so that each neuron not take the maximum or minimum possible in order to avoid the disappearance of the gradient region.
  • Instead of using relu sigmoid function and a tanh as activation. Please refer to the principle of zero-based introductory article on deep learning (4) - activation function convolution of a neural network.
  • Use RNNs other structures, such as when the length of the memory network (LTSM) and Gated Recurrent Unit (GRU), which is the most popular approach. We will introduce the two networks in a future article.
    Reference:
    realization
    Comments

When Long Short Term Memory networks length Memories

Here Insert Picture Description
Forgotten gate
Here Insert Picture Description
input gate
Here Insert Picture Description
output gate
Here Insert Picture Description
current state of the input is calculated:
Here Insert Picture Description
Current Status Calculated:
Here Insert Picture Description
final output:
Here Insert Picture Description

TOWER CRANE

Gated Recurrent Unit

LSTM provided only on the basis of the reset gate (Reset Gate r t r_t ) And update the door (Update Gate from t z_t )
Cell state and output as a synthesis
Here Insert Picture Description

Published 35 original articles · won praise 2 · Views 1422

Guess you like

Origin blog.csdn.net/qq_30776035/article/details/104543428