02-Feature reverse image experiment

Feature reverse image

github: https://github.com/Gary11111/02-Inversion

Research Background

ONNX

Many excellent visual models are written in caffe, many new research papers are written in Pytorch, and more models are written in TF, so if we want to test them, we need the corresponding framework environment. When ONNX exchange format, we can Test different models in the same environment.

VGGNET structure py implementation

13 convolutional layers + 3 fully connected layers

Convolutional layer
  • conv2d+bias
  • batchnorm regularization
  • relu activation
def conv_layer(self, bottom, name, stride = 1):
    with tf.variable_scope(name):
        filt = self.get_conv_filter(name)
        conv = tf.nn.conv2d(bottom, filt, [1, stride, stride, 1], padding='SAME')
        conv_biases = self.get_bias(name)
        bias = tf.nn.bias_add(conv, conv_biases)
        mean = self.get_mean(name)
        variance = self.get_variance(name)
        offset = self.get_beta(name)
        scale = self.get_gamma(name)
        norm = tf.nn.batch_normalization(bias, mean, variance, offset, scale, 1e-20 )
        relu = tf.nn.relu(norm)
        return relu

HOG direction gradient histogram

HOG + SVM is the main method of pedestrian detection

  1. Main idea: In an image, the appearance and shape of the local target can be well described by the gradient or the density distribution in the edge direction (the statistical information of the gradient is mainly concentrated in the edge part)
  2. Specific implementation method: divide the image into small connected areas, called cell units, and then collect the histogram of the gradient or edge direction of each pixel in the cell unit, and then combine the histograms to form a feature descriptor.
  3. advantage:
    1. HOG operates on the local units of the image, so it can maintain a good invariance to the geometric and optical deformations of the image, and these two deformations will only appear in the larger spatial field.
    2. Under the conditions of coarse space abstraction, fine direction sampling, and strong local optical normalization, as long as the pedestrian problem can maintain an upright posture, the impact of some subtle movements can be ignored.

SIFT scale-invariant feature transformation

It is used to detect and describe local features in the image. It looks for extreme points in the spatial scale and extracts its position, scale, and rotation invariants.

Applications: Object recognition, robot map perception and navigation, image tracking, gesture recognition, etc.

Local image features help identify objects:

  1. The SIFT feature is based on some local appearance features of interest on the object and has nothing to do with the image size and rotation. The tolerance page for light, noise, and slight viewing angle changes is quite high. Based on this feature, it is highly conspicuous and easy to capture. In the huge feature database, it is easy to identify objects and rarely misrecognize them.
  2. The detection rate of using SIFT to describe features for partial object occlusion is quite high, and it only needs more than three SIFT object features to calculate the position and orientation. The recognition speed can be close to real-time calculation.

Homework ideas

VGGNET16 is ready (parameters have been trained).

  1. Use the original image to build a VGG16 calculation graph — bottom
  2. Use noise to build a VGG16 calculation graph
  3. Specify a certain layer of the calculation graph, such as conv3_1, the purpose is to see what has been learned in this layer of neural network.
  4. The bottom is only initialized once during construction and will not be updated, as the target of noise learning; use the Euclidean distance to calculate the error [loss function]
  5. Use ADAM as an optimizer to optimize the loss function.

Record TF optimizer built-in method

  • compute_gradients(loss, val_list)

Used to calculate the partial derivative of loss for each item in val_list

  • apply_gradients(grads)

Update the variable with the value returned by compute_gradients as the input parameter.

These two functions are equivalent to the minimize () method, with gradient correction for disassembly

Regularization method: TV full variational model

The usage is similar to the L1, L2 regularization method, adding a regular term at the end of the objective function.

img

tf.image.total_variation

Calculate and return the total change of one or more images. The total change is the synthesis of the absolute difference of adjacent pixels in the input image and is an important factor to measure how much noise is in the image.

Residual e-learning

Why introduce residual network?

The increase in the number of network layers in deep learning is generally accompanied by the following problems:

  1. Computing resource consumption-can be solved by GPU cluster
  2. Overfitting-can be solved by collecting massive data and cooperating with dropout
  3. Gradient disappears, gradient explodes – can be solved with BN

It seems that you can benefit from increasing the depth of the model, but as the network depth increases, the model appears to be degraded: as the number of network layers increases, the training set loss gradually decreases and the region is saturated; if you increase the depth, the training set's Instead, loss will increase. This is not overfitting, because overfitting will continuously reduce the training error.

When the network is degraded, the shallow network can achieve better results than the deep network. At this time, if we pass the low-level features to the high layer, the effect will not be worse than the shallow network, or a vgg-100 network The 98th layer uses the same features as the vgg-16 on the 14th layer, so the effect of vgg100 will be the same as vgg16. Therefore, you can add a direct mapping between 98 layers of vgg100 and 14 layers of vgg16 to achieve this effect.

Information theory: Due to the existence of DPI (data processing inequality), in the process of forward transmission, as the number of layers increases, the image information contained in the feature map will decrease layer by layer, and the direct mapping of resnet is added to ensure the layer l + 1 The network must contain more image information than layer 1.

Residual network

Residual block

img

The residual network is composed of a series of residual blocks. The residual part on the right side of the curve is generally composed of two to three convolution operations.

In a convolutional network, x_land x_l+1a different number of feature map, it requires a 1*1convolutional network liter dimensionally or dimensionality reduction.

Principle of residual network

[official]

[official]

For the deeper layer L, its relationship with layer l can be expressed as:

[official]

Two attributes of the residual network are reflected:

  1. Layer L can be represented by any layer shallower than it
  2. L is the unit cumulative sum of each residual block, and MLP is the accumulation of the feature matrix.

According to the BP derivation rule, the gradient of the loss function loss with respect to x_l can be expressed as:

[official]

  1. Throughout the training, the second term cannot always be -1, and the residual network will not disappear.
  2. The first term indicates that the gradient of the L layer can be transmitted to any layer shallower than it.

Direct mapping is the best option

You can add a coefficient to the first term of (6) λ \lambda , after derivation, we find λ > 1 \lambda>1 gradient will explode, if λ < 1 \lambda<1 gradient will disappear

The experimental record is as follows

Effect of learning rate on training effect

lr.PNG

It can be seen that the learning rate changes from 0.001 to 0.01, the model convergence speed is faster, and the final effect is different. The greater the learning rate, the closer the image is to the real result.

Experimental sampling image effect of 1000 iterations at different learning rates:

Comparison chart.PNG

The impact of Total-Variation regularization

  • Join TV regularization

    From TF 2.0 official manual

    tv_regular = tf.reduce_sum(tf.image.total_variation(noise_layer))
    

    tv_02.PNG

  • Increase the proportion of TV regularization

    tv_01.PNG

  • Increased fea/repratio

Comparison of deep network and shallow network

vgg16 uses conv3_1 to get the image

conv3_1.PNG

The image obtained by vgg16 using conv1_1

conv1_1.PNG

vgg16 uses conv3_1 + fc6 to get the image

[External chain image transfer failed, the source site may have an anti-theft chain mechanism, it is recommended to save the image and upload it directly (img-CtwUye4B-1586339197527) (E: \ Junior Year \ junior year \ deep learning \ Assignment \ MyWorkPlace \ 02- inversion \ project \ Project documents \ img \ fc6conv3.PNG)]

Feature extraction under different models

resnet-18 using res2 to get the image

res2.PNG

resnet-18 image obtained using middle1

middle0.PNG

Code modification: model loading and target layer selection ==> see detailsq4.diff

Other: Images obtained using your own pictures

The results obtained using resNet-18 res2 are as follows.

self.png

Paper reading

Interpretability aspects of neural networks

External image transfer ... (img-NwO8OTyp-1586339197528)]

Code modification: model loading and target layer selection ==> see detailsq4.diff

Other: Images obtained using your own pictures

The results obtained using resNet-18 res2 are as follows.

[External link image is being transferred ... (img-8aBWMyGn-1586339197528)]

Paper reading

Interpretability aspects of neural networks

paper.png

Published 6 original articles · received 1 · views 390

Guess you like

Origin blog.csdn.net/gky_1111/article/details/105393203