Convolution layer cell layer

Disclaimer: This article is a blogger original article, follow the CC 4.0 by-sa copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/weixin_41417982/article/details/81412076

Construction of the easiest network after it is time convolution and pooled together with the. This, although I have not started the idea, but I know it must be a very long article.

Convolutional neural network (Convolutional Neural Layer, CNN), in addition to the fully connected layers (sometimes nor all connection layer, because of the emergence Global average pooling), and further comprising a convolution layer cell layer. Convolution layer used to extract features, and can reduce the number of cell layer parameters.

Convolution layer

To talk about the principle of convolution layer.

We are using convolution kernels to extract features, it can be said to be a convolution kernel matrix. If we set a convolution kernel matrix 3 * 3, and our picture is a picture resolution of 5 * 5. Then the convolution kernel task as follows:

 

 

From the top left, the convolution kernel matrix corresponding to the range of 3 * 3 data, and then multiplying the value derived by adding a. In this order, one we can obtain a value of nine pixels on every operation. This matrix is formed by nine values we call activation map (Activation map). This is the principle of our convolution layer. See also the following a gif:
wherein, for the convolution kernel

101010101101010101

 

 

In fact, we usually example convolution kernel has been rotated 180 degrees once, mainly because of the calculation process. Details do not understand, but the principles are the same.

But in fact we input image is generally a three-dimensional, i.e. containing R, G, B three channels. But in fact, after a convolution kernel, three will become one-dimensional. It is an entire screen when sliding, in fact, the value will add up all of the three channels, the final output of just a one-dimensional matrix. And a plurality of convolution kernel (the number of a convolution kernel convolution layer is determined by its own) Activation Map formed after sliding stacked, and then through an output activation function is a convolution of the layer.

 

Convolution layer there are two important parameters: the step size and padding.

So-called step is to control the movement from the convolution kernel. In the above example we see, convolution kernels are separated by one pixel mapping, then we can make it through two, three, and this distance is what we call the step.

The padding is what we do operations on the data. There are two, one is not operating, so that one is 0 complement activation map after convolution dimensions unchanged. The above data we can see 5 * 5 * 3 is the map after 3 * 3 convolution kernel convolution of the shape of a 3 * 3, i.e., data of a shape different from the start. Sometimes in order to avoid this change, we use "fill 0" - that make up the outer layer of the data 0.

Here is a diagram:

Step 2
Step 2 (images from the heart of the machine )
Up 0
the complement of the Change 0 (images from the heart of the machine )

 

Convolution understand the history of the development of all people should know that convolution neural networks began to emerge is the most LeNet-5 LeCun (name really like Chinese people) created in recognition of handwritten numbers.

 

LeNet-5

 

AlexNet behind the outbreak is due to come out on top in the race ImageNet, abruptly error becomes half of last year. Since then, the network has become the convolution of AI hot topic, a lot of papers and networks continue to play its potential, and its black box is also constantly being interpreted people.

能否对卷积神经网络工作原理做一个直观的解释? - Owl of Minerva的回答 - 知乎里面通过我们对图像进行平滑的操作进而解释了卷积核如何读取特征的。

我们需要先明确一点,实验告诉我们人类视觉是先对图像边缘开始敏感的。在我的理解中,它就是说我们对现有事物的印象是我们先通过提取边界的特征,然后逐渐的完善再进行组装而成的。而我们的卷积层很好的做到了这一点。

 

 

这是两个不同的卷积核滑动整个图像后出来的效果,可以看出,经过卷积之后图像的边界变得更加直观。我们也可以来看下VGG-16网络第一层卷积提取到的特征:

VGG-16

 

由此来看,我们也知道为什么我们不能只要一个卷积核。在我的理解下,假使我们只有一个卷积核,那我们或许只能提取到一个边界。但假如我们有许多的卷积核检测不同的边界,不同的边界又构成不同的物体,这就是我们怎么从视觉图像检测物体的凭据了。所以,深度学习的“深”不仅仅是代表网络,也代表我们能检测的物体的深度。即越深,提取的特征也就越多。

Google提出了一个项目叫Deepdream,里面通过梯度上升、反卷积形象的告诉我们一个网络究竟想要识别什么。之前权重更新我们讲过梯度下降,而梯度上升便是计算卷积核对输入的噪声的梯度,然后沿着上升的方向调整我们的输入。详细的以后再讲,但得出的图像能够使得这个卷积核被激活,也就是说得到一个较好的值。所以这个图像也就是我们卷积核所认为的最规范的图像(有点吓人):

Deepdream
其实这鹅看着还不错,有点像孔雀。

 

池化层 (pooling layer)

前面说到池化层是降低参数,而降低参数的方法当然也只有删除参数了。

一般我们有最大池化和平均池化,而最大池化就我认识来说是相对多的。需要注意的是,池化层一般放在卷积层后面。所以池化层池化的是卷积层的输出!

 

 

扫描的顺序跟卷积一样,都是从左上角开始然后根据你设置的步长逐步扫描全局。有些人会很好奇最大池化的时候你怎么知道哪个是最大值,emmm,其实我也考虑过这个问题。CS2131n里面我记得是说会提前记录最大值保存在一个矩阵中,然后根据那个矩阵来提取最大值。

至于要深入到计算过程与否,应该是没有必要的。所以我也没去查证过程。而且给的都是示例图,其实具体的计算过程应该也是不同的,但效果我们可以知道就好了。

至于为什么选择最大池化,应该是为了提取最明显的特征,所以选用的最大池化。平均池化呢,就是顾及每一个像素,所以选择将所有的像素值都相加然后再平均。

池化层也有padding的选项。但都是跟卷积层一样的,在外围补0,然后再池化。

代码解析
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
summary_dir = './summary'
#批次
batch_size = 100

n_batch = mnist.train.num_examples // batch_size

x = tf.placeholder(tf.float32, [None, 784], name='input')
y = tf.placeholder(tf.float32, [None, 10], name='label')

def net(input_tensor):
    conv_weights = tf.get_variable('weight', [3, 3, 1, 32],
                                    initializer=tf.truncated_normal_initializer(stddev=0.1))
    conv_biases = tf.get_variable('biase', [32], initializer=tf.constant_initializer(0.0))

    conv = tf.nn.conv2d(input_tensor, conv_weights, strides=[1, 1, 1, 1], padding='SAME')
    relu = tf.nn.relu(tf.nn.bias_add(conv, conv_biases))

    pool = tf.nn.max_pool(relu, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

    pool_shape = pool.get_shape().as_list()
    nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
    pool_reshaped = tf.reshape(pool, [-1, nodes])

    W = tf.Variable(tf.zeros([nodes, 10]), name='weight')
    b = tf.Variable(tf.zeros([10]), name='bias')
    fc = tf.nn.softmax(tf.matmul(pool_reshaped, W) + b)

    return fc

reshaped = tf.reshape(x, (-1, 28, 28, 1))
prediction = net(reshaped)
loss_ = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y, 1), logits=prediction, name='loss')
loss = tf.reduce_mean(loss_)

train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(prediction, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(31):
        for batch in range(n_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys})
        acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
        print('Iter' + str(epoch) + ",Testing Accuracy" + str(acc))

这相对于我第一个只用全连接的网络只多了一个net函数,还有因为卷积层的关系进来的数据x需要改变形状。只讲这两部分:

reshaped = tf.reshape(x, (-1, 28, 28, 1))
prediction = net(reshaped)

由于我们feedict上面是,feed_dict={x: mnist.test.images, y: mnist.test.labels},而这样子调用tensorflow的句子我们得到的x固定的形状。因此我们应用tf.reshape(x_need_reshaped,object_shape)来得到需要的形状。

其中的1−1 表示拉平,不能用None,是固定的。

conv_weights = tf.get_variable('weight', [3, 3, 1, 32],
                                    initializer=tf.truncated_normal_initializer(stddev=0.1))
conv_biases = tf.get_variable('biase', [32], initializer=tf.constant_initializer(0.0))

conv = tf.nn.conv2d(input_tensor, conv_weights, strides=[1, 1, 1, 1], padding='SAME')
relu = tf.nn.relu(tf.nn.bias_add(conv, conv_biases))

pool = tf.nn.max_pool(relu, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')

Most applications are built-in function to initialize the weight (that is, the convolution kernel) and biases (bias term). We did not mention offset term, but in fact is more of a parameter to control, so we speak convolution layer, it also did not say how. After the code is in accordance with and then were out of Activation Map plus bias. Pooling also used the largest pooling.

Note relu. It is also an active function, the role can be said like talking about before softmax, but it is more used in the convolution layer, but is also recognized as good activation function. It has many variants. We are interested you can go access to information. After only write articles in that regard.

    pool_shape = pool.get_shape().as_list()
    nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
    pool_reshaped = tf.reshape(pool, [-1, nodes])

    W = tf.Variable(tf.zeros([nodes, 10]), name='weight')

Output pooling layer that we do not know how it's shape (of course, you can also hands-count). Therefore, even if the cell layer pull into a one-dimensional matrix, we do not need to know how W shape. Therefore, we view the pool (ie the output pooled layer) shape, I secretly print a moment to [None, 14, 14, 32], and therefore the pool flare, is [None, 14 * 14 * 32, 10]. In order to calculate the next layer is fully connected, we should also form W [14 * 14 * 32, 10]. Principles of this code is the case.

The accuracy of the same taken after 15:

result

 

emmm, than not with before, obviously a lot better than before. The next chapter decided to sum up, the optimization method better.

Reference
https://mlnotebook.github.io/post/CNN1/ (unfortunately it was UK)
Can you give us a visual interpretation of the convolution neural network works? - Owl of Minerva answer - know almost
CS231n

 

Guess you like

Origin www.cnblogs.com/mfryf/p/11373069.html