The depth of the residual shrinkage network: With attention mechanism to achieve feature soft thresholding

Author | Harbin Institute of Technology (Weihai) lecturer Zhao Hang

In this paper, a new interpretation of the depth of focus algorithm, namely the depth of residual contraction Network (Deep Residual Shrinkage Network).

Functionally, the network is a depth of residual shrinkage or for strong noise characteristic learning data is highly redundant. This paper reviews the basics, and then introduces the motivation and concrete realization of the depth of the residual shrinkage of the network, we want to help

Related infrastructure

The depth of the residual shrinkage network is based primarily on three parts: the depth of the residual network, soft threshold function and attention mechanisms.

1.1 depth of the residual network

The depth of the residual network is undoubtedly one of the most successful in recent years, the depth of learning algorithms, reference has exceeded forty thousand times on Google Scholar. Compared to ordinary convolution neural network, the depth of the residual cross-layer network by way of identical paths to ease the difficulty of training deep networks.

The depth of the residual network trunk portion is formed by stacking a lot of the residual module, wherein a common residual module shown below.

1.2 soft threshold function

Soft threshold function is a key step in most of the noise reduction methods. First of all, we need to set a positive threshold. The threshold value can not be too large, i.e. not greater than the maximum absolute value of the input data, otherwise the output will be all zeros.

Then, the soft threshold function will be lower than the absolute value of the input data threshold value is set to zero, and the absolute value of the input data is greater than this threshold toward zero shrink also, the relationship between its input and output in the following figure (a) shown in FIG.

Output y soft threshold function of the derivative of the above input x (b) of FIG. We can see that the derivative value of either 0 or the value 1. From this perspective, then, the soft threshold function and ReLU activation function has some similarities, but also conducive to the gradient back-propagation algorithm of the depth of the training and learning.

It is worth noting that the threshold selection has a direct impact on the outcome soft threshold function, is still a problem.

 

1.3 attentional mechanisms

Attentional mechanisms in recent years, the depth of learning in the field of super-hot topic, and Squeeze-and-Excitation Network (SENet) is one of the most classic of attention algorithm.

As shown below, a value of the coefficient SENET obtain a small set of weights by learning, weighting for each channel characteristics. This is actually a mechanism attention: First, assess the importance of various characteristics of the channel, and then give the appropriate weight based on the weight of each channel features its importance.

As shown below, SENet may be integrated with a residual module. In this mode, the presence of identical cross-layer path, SENET can more easily be trained. Also worth noting is that the weight coefficient for each sample are set according to their own; that is, each sample can have its own unique set of weights coefficients.

The depth of the residual shrinkage network

Next, this part of the motivation for the depth of residual shrinkage network, implementation, and validation of the advantages which respectively are introduced.

2.1 Motivation

First of all, most of the data in the real world, including images, voice or vibration, noise, or more or less contain redundant information.

Broadly speaking, in a sample inside, any information not relevant to the current pattern recognition tasks, can be considered to be noise or redundant information. These noise or redundant information is likely to have a current pattern recognition tasks adversely affected.

Secondly, for any two samples, their noise content are often redundant or different. In other words, some noise or redundant sample contained to be more, some to be less. This requires us in the design of the algorithm, the algorithm should have the ability according to the characteristics of each sample, a separate set of parameters.

Driven by the above two points, we can not be the traditional noise reduction algorithm for the soft threshold function is introduced into the depth of the residual network do? How soft threshold function threshold should be chosen in it? The depth of the residual shrinkage network to give an answer.

2.2 realized

Residual shrink depth depth residuals network convergence network, SENet and soft threshold function. As shown below, the depth of the residual shrinkage network is to "re-weighting" the residual replacement SENet mode become "soft thresholding."

In SENet in the embedded small networks are used to obtain values ​​of the coefficients a set of weights; depth residual shrinkage network, the network is small for obtaining a set of thresholds.

In order to obtain an appropriate threshold, compared to the original SENET, the depth of the network structure of the small residual shrinkage inside the network is also adjusted. Specifically, the threshold value of the small network output, is (absolute values ​​of the respective feature channels) × (coefficient between 0 and 1 group).

In this way, the depth of the residual shrinkage network not only ensures that all thresholds are positive, but the threshold is not too large (not all outputs are zero).

As shown below, the depth of the residual shrinkage of the network structure of the overall network and ordinary residual depth is the same, comprising an input layer, a layer of the beginning of a convolution, the basic module and a series of global mean of the last full tank and connected to the output layer.

2.3 Advantages

First of all, the soft threshold function required thresholds are automatically set by a small network, avoiding artificial expertise needed to set the threshold.

Then, the residual shrinkage of the network to ensure that the depth of the soft threshold function of the threshold value is positive, and in the appropriate range, the situation is avoided all zero output.

Meanwhile, each sample has its own unique set of thresholds, so that the depth of the network for different residual shrinkage noise content of the individual sample cases.

in conclusion

Since the noise or redundant information is everywhere, the depth of the residual shrinkage network, or that such "attention mechanism" + ideas "soft threshold function", perhaps has a broad development space and range of applications.

Papers link:

https://www.paperweekly.site/papers/3397

Code links:

https://github.com/zhao62/Deep-Residual-Shrinkage-Networks

【end】

◆精彩推荐◆

对存在潜伏期的新冠肺炎,快速分析其传染关系及接触关系,积极采取隔离、观察和治疗措施是非常有利的防控疫情的科学防控依据。周四(明晚)20点,我们一起来看Sophon KG如何追寻新冠病毒轨迹,运用AI技术、工具建立相关知识图谱,通过确诊案例的亲属、同事和朋友的关系网找出密切接触者进行及时隔离,同时刻画出确诊案例的活动轨迹,找到其关系网之外的密切接触者及病毒可能的“行凶环境”。推荐阅读

百万人学AI:CSDN重磅共建人工智能技术新生态154万AI开发者用数据告诉你,中国AI如何才能弯道超车?技术大佬的肺腑之言:“不要为了AI而AI”!| 刷新 CTO悼念前端大牛司徒正美业内最大的“空气币”——以太坊?Spark3.0发布了,代码拉过来,打个包,跑起来!你点的每个“在看”,我都认真当成了AI
Released 1375 original articles · won praise 10000 + · views 6.85 million +

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/105259335