Soft threshold value under the mechanism of attention: the depth of the residual shrinkage network

    As the name suggests, the depth of the residual shrinkage in the network is a "residual network" on the basis of improved algorithms, by "residual network" and "contraction" composed of two parts . Among them, the residual net gains in 2016 ImageNet image recognition contest winner, has become the basis for a network depth learning areas; contraction refers to the soft thresholding, a key step in many signal noise reduction algorithm. Depth residual shrinkage network, soft thresholding required threshold, is substantially provided by means of attentional mechanisms.
    In this paper, we first residual network basics, soft thresholding and attention mechanisms are briefly reviewed, and then expand the depth of interpretation of the motives residual shrinkage networks, algorithms and applications.

1. Based on the review
1.1 residuals network
    Essentially, residual network (also known as residual network depth, the depth of the residual learning) is a convolution neural network. Compared to normal convolution neural network, residual network using a cross-layer identity connection, in order to reduce the difficulty of training convolution neural network. A basic module residual network shown in Figure 1.

FIG 1 a basic residual network module

1.2 soft thresholding
    soft thresholding is a key step in a number of signal noise reduction method. This is useful to an absolute value below a certain threshold characteristic is set to zero, the other features can also be adjusted towards zero, i.e. "shrinkage." Here, the threshold parameter is a need to pre-set values which have a direct impact on the size of the result of noise reduction. The relationship between the threshold value of the soft input and the output as shown in FIG.

FIG 2 soft thresholding

    As can be seen from Figure 2, soft thresholding is a nonlinear transformation, and ReLU has properties very similar to the activation function: gradient is either 0 or 1. Thus, soft thresholding function can be activated as a neural network. In fact, some of the neural network has a soft thresholding was used as the activation function.

1.3 attentional mechanisms
    focus mechanism is to focus on the key mechanism for the local information can be divided into two steps: first, by scanning the global information, the local discovery of useful information; second, enhance useful information and suppress redundant information.
Squeeze-and-Excitation Network is the depth of learning in a very classic attentional mechanisms. It can be a small sub-network, automatically learn to give a set of weights, wherein for each channel is weighted FIG. The implication is that some features of the channel is more important, while others feature information channel is redundant; so, we can enhance the useful feature of the channel in this way, to weaken redundant features channel. Squeeze-and-Excitation Network a basic module as shown in FIG.

FIG 3 Squeeze-and-Excitation Network a basic module

    It is worth noting, in this way, each sample can have its own unique set of weights, according to the characteristics of the sample itself, a weighted adjustment path unique characteristics. For example, the specimen A first channel characteristic is important, wherein the second channel is not critical; wherein the first passage is unimportant sample B, a second channel is an important feature; in this way, the specimen A can have its own set of weights, wherein the first channel to strengthen and weaken a second channel characteristic; Likewise, sample B can have its own set of weights, wherein the first channel to weaken, to strengthen the second channel characteristic.

2. The depth of the residual shrinkage network theory
2.1 Motivation
    First, the data in the real world, more or less contain some redundant information. Then we can try to soft thresholding embedded in the residual network in order to eliminate redundant information.
    Secondly, each sample is often redundant information content is different. Then we can make use of the mechanism of attention, according to the situation of individual samples, each sample adaptively to set different thresholds.

2.2 Algorithm for
    the residual network andSqueeze-and-Excitation NetworkSimilarly, the depth of the residual shrinkage by the network stack is formed by a number of basic modules. Each basic module has a sub-network, for automatically learning a set of thresholds to obtain, for the soft threshold value of the characteristic of FIG. It is worth noting, in this way, each sample has its own unique set of thresholds. A basic module depth residual shrinkage network as shown below.

FIG 4 a basic module depth residual shrinkage network

    Depth overall configuration of a residual shrinkage of the network as shown below, is an input layer, a number of basic modules and the last layer or the like connected to the output of the whole composition.

FIG 5 overall configuration of the depth of a residual shrinkage network

2.3 Application of
    the original paper, is applied to the depth of the residual shrinkage network fault diagnosis of rotary mechanical vibration signal. In principle, however, the depth of the residual shrinkage is the case for the network dataset containing redundant information and the redundant information is everywhere. For example, when the image recognition, the image will always contain a region associated with the labels; when the speech recognition, audio often contain various forms of noise. Therefore, the depth of the residual shrinkage network, or that such "deep learning" + "soft thresholding" ideas + "attention mechanism", has a more extensive research prospects.

参考文献
[1] K. He, X. Zhang, S. Ren, et al. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[2] K. He, X. Zhang, S. Ren, et al. Identity mappings in deep residual networks. European Conference on Computer Vision, 2016: 630-645.
[3] J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
[4] D.L. Donoho. De-noising by soft-thresholding. IEEE Transactions on Information Theory, 1995, 41(3): 613-627.
[5] K. Isogawa K, T. Ida, T. Shiodera, et al. Deep shrinkage convolutional neural network for adaptive noise reduction. IEEE Signal Processing Letters, 2017, 25(2): 224-228.
[6] M. Zhao, S, Zhong, X. Fu, et al. Deep residual shrinkage networks for fault diagnosis. IEEE Transactions on Industrial Informatics, 2019, DOI: 10.1109/TII.2019.2943898

Guess you like

Origin www.cnblogs.com/uizhi/p/12388600.html