PaddlePaddle entry-depth study (C): Basic acquaintance of convolution neural network convolution

This course is an introductory offer of zero-based Baidu official depth course of study, there is no depth learning technology base or weak base mainly for students, to help you leap from 0 to 1+ depth study in the field. From this course, you will learn to:

  • Deep learning the basics
  • numpy achieve artificial neural network and gradient descent algorithm
  • The main principle of the direction of the field of computer vision, practice
  • The main direction of the principle of natural language processing, practice
  • Personalized recommendation algorithm theory, practice

This week is the third week of lecture, deep learning technology platform Baidu, senior R & D engineer Sun Gaofeng, he began to explain the depth of learning practical applications in computer vision direction. Today we bring is the basis of the convolutional neural network convolution acquaintance.

Computer Vision Overview

Computer Vision as a machine to learn how to let go "see" the scientific disciplines, specifically, is to allow the machine to recognize the camera shooting pictures or video object, to detect the position of the object is located, and the target object tracking, in order to understand and describe the images or video scenes and stories in order to simulate the human brain visual system. Thus, also commonly called computer vision machine vision, which aims to build able to "sense" or videos from the image information in the manual system.

Computer vision technology after decades of development, has been in transportation (license plate recognition, road illegal capture), security (face gates, area monitoring), financial (brush face pay bills automatically identifies the counter), healthcare (medical imaging diagnosis), industrial production (defects automatic detection) and other applications, influence, or are changing people's daily life and industrial production. Future, with the constant evolution of technology, will emerge more product applications, to create greater convenience and wider opportunities for our lives.

Here Insert Picture Description
Fly paddle provides a rich API for computer vision tasks, and by the underlying optimization and acceleration to ensure the performance of these API. At the same time, flying paddles also provides a rich model base, the cover image classification, detection, segmentation, character recognition, and a plurality of video art understanding the like. Users can use these API component models can be developed in the secondary model library provided on the basis of fly paddle.
Due to space limitations, this chapter will focus on two typical computer vision tasks: image classification and object detection. Mainly covers the following:

  • Convolution neural network: convolution neural network (Convolutional Neural Networks, CNN) is the most classic model of the structure of computer vision technology. This introduces the common module convolutional neural network, comprising: a convolution, pooling the like.
  • Image classification: The classical model of the structure of an image classification algorithms, algorithms and applications through case presentation eye screening.
  • Target Detection: The target detection YOLO-V3 algorithm, and apply YOLO-V3 algorithm demonstrated by forest insect pests dataset detection tasks cases.

Computer vision development

Before the introduction of convolution neural networks, computer vision we first review the development process, this talk from biological vision.

Time for the formation of biological vision of the current academic formation is still not conclusive, researchers believe that the earliest organisms in visual form on the back some 700 million years ago, jellyfish, some researchers believe that biological vision to produce about 500 million ago Cambrian years ago [1, 2]. Cambrian explosion reason has been a mystery, but it is certain that the ability to have a vision in the Cambrian animals, predators can more easily find prey, predators were also found earlier predators position. Visual ability exacerbated game between hunter and prey, also spawned a more intense survival evolution rules. The formation of the visual system of a strong impetus to the evolution of the food chain, to accelerate the process of biological evolution, biological history of important milestone. After hundreds of millions of years of evolution, the current human visual system already has a very high complexity and powerful, the number of neurons in the human brain reached 100 billion, these neurons are interconnected through a network, such a huge visual neural network so that we can easily observe the world around them.

For humans, the identification of cats and dogs is a very easy thing to do. But for the computer, even an expert versed in programming, it is difficult to easily write universal program (for example: Suppose the program thinks is a dog large size, small size cat, but because of different shooting angles may pixel on a picture of a cat occupy more than the dog). So, how can let the computer like people to understand the world around it? Researchers from different angles to try to solve this problem, which also developed a series of sub-tasks, as shown in Figure 2.

Here Insert Picture DescriptionHere Insert Picture DescriptionIn the early image classification task, usually the first artificial image feature extraction, and then machine-learning algorithms to classify these characteristics, the results strongly depends on the classification of feature extraction methods, often only experienced investigators to complete. In this context, the feature extraction method emerged Neural Network. Yan LeCun convolutional neural network is first applied to the field of image recognition, the main logic is to use a convolutional neural network image feature extraction, image and predicted Category, the training data by constantly adjusting network parameters, could eventually form a set of automatic image feature extraction and classification of these network characteristics. This approach has achieved great success in the handwritten numeral recognition task, but next time, but not well developed. The main reason is the aspect of the incomplete data set, only simple processing tasks, prone to overfitting the data on large size; hardware aspect is the bottleneck, the network model complexity, particularly slow computing speed.

Now, with advances in Internet technology, the amount of data showing massive growth, more and more rich data sets are emerging. In addition, thanks to the ability to upgrade hardware, computer operators are increasingly powerful force. The researchers will continue to have new models and algorithms to the field of computer vision. Thus it spawned a growing wealth of the model structure and more accurate precision, while the computer vision problem dealt with more and more rich, including classification, detection, segmentation, scene description, image generation and style transform, going even further limited to two-dimensional images, including video processing and 3D visualization techniques and so on.

Convolution neural network

Convolutional neural network computer vision is the most commonly used model structure. This section mainly introduces the reader to some basic convolution neural network modules, including:

  • Convolution (Convolution)
  • Pooling (Pooling)
  • ReLU activation function
  • Batch normalization (Batch Normalization)
  • Discard method (Dropout)

Recall that in the last chapter, "a case with you thoroughly understand the depth of learning," we introduced the handwritten numeral recognition task, the application layer is fully connected feature extraction, is about an all pixels on the picture expands to a 1-dimensional vector input network, there are two problems as follows:

  1. 输入数据的空间信息被丢失。 空间上相邻的像素点往往具有相似的RGB值,RGB的各个通道之间的数据通常密切相关,但是转化成1维向量时,这些信息被丢失。同时,图像数据的形状信息中,可能隐藏着某种本质的模式,但是转变成1维向量输入全连接神经网络时,这些模式也会被忽略。

  2. 模型参数过多,容易发生过拟合。 在手写数字识别案例中,每个像素点都要跟所有输出的神经元相连接。当图片尺寸变大时,输入神经元的个数会按图片尺寸的平方增大,导致模型参数过多,容易发生过拟合。

为了解决上述问题,我们引入卷积神经网络进行特征提取,既能提取到像相邻素点之间的特征模式,又能保证参数的个数不随图片尺寸变化。图3 是一个典型的卷积神经网络结构,多层卷积和池化层组合作用在输入图片上,在网络的最后通常会加入一系列全连接层,ReLU激活函数一般加在卷积或者全连接层的输出上,网络中通常还会加入Dropout来防止过拟合。

Here Insert Picture Description
说明:

在卷积神经网络中,计算范围是在像素点的空间邻域内进行的,卷积核参数的数目也远小于全连接层。卷积核本身与输入图片大小无关,它代表了对空间临域内某种特征模式的提取。比如,有些卷积核提取物体边缘特征,有些卷积核提取物体拐角处的特征,图像上不同区域共享同一个卷积核。当输入图片大小不一样时,仍然可以使用同一个卷积核进行操作。

卷积(Convolution)

这一小节将为读者介绍卷积算法的原理和实现方案,并通过具体的案例展示如何使用卷积对图片进行操作,主要涵盖如下内容:

  • 卷积计算
  • 填充(padding)
  • 步幅(stride)
  • 感受野(Receptive Field)
  • 多输入通道、多输出通道和批量操作
  • 飞桨卷积API介绍
  • 卷积算子应用举例

卷积计算

卷积是数学分析中的一种积分变化的方法,在图像处理中采用的是卷积的离散形式。这里需要说明的是,在卷积神经网络中,卷积层的实现方式实际上是数学中定义的互相关 (cross-correlation)运算,与数学分析中的卷积定义有所不同,这里跟其他框架和卷积神经网络的教程保持一致,都使用互相关运算作为卷积的定义,具体的计算过程如 图4 所示。

Here Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture Description
Here Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture Description
Here Insert Picture DescriptionHere Insert Picture Description

飞桨卷积API介绍

飞桨卷积算子对应的API是paddle.fluid.dygraph.nn.Conv2D,用户可以直接调用API进行计算,也可以在此基础上修改。常用的参数如下:

  • name_scope, 卷积层的名字,数据类型是字符串,可以是"conv1"或者"conv2"等形式。
  • num_filters, 输出通道数目,相当于上文中的。
  • filter_size, 卷积核大小,可以是整数,比如3;或者是两个整数的list,例如[3, 3]。
  • stride, 步幅,可以是整数,比如2;或者是两个整数的list,例如[2, 2]。
  • padding, 填充大小,可以是整数,比如1;或者是两个整数的list,例如[1, 1]。
  • act, 激活函数,卷积操作完成之后使用此激活函数作用在神经元上。

输入数据维度,输出数据维度,权重参数的维度,偏置参数的维度是。
卷积算子应用举例

下面介绍卷积算子在图片中应用的三个案例,并观察其计算结果。

案例1——简单的黑白边界检测

下面是使用Conv2D算子完成一个图像边界检测的任务。图像左边为光亮部分,右边为黑暗部分,需要检测出光亮跟黑暗的分界处。可以设置宽度方向的卷积核为【1,0,-1】,此卷积核会将宽度方向间隔为1的两个像素点的数值相减。当卷积核在图片上滑动的时候,如果它所覆盖的像素点位于亮度相同的区域,则左右间隔为1的两个像素点数值的差为0。只有当卷积核覆盖的像素点有的处于光亮区域,有的处在黑暗区域时,左右间隔为1的两个点像素值的差才不为0。将此卷积核作用到图片上,输出特征图上只有对应黑白分界线的地方像素值才不为0。具体代码如下所示,结果输出在下方的图案中。

import matplotlib.pyplot as plt

import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph.nn import Conv2D
from paddle.fluid.initializer import NumpyArrayInitializer
%matplotlib inline

with fluid.dygraph.guard():
    # 创建初始化权重参数w
    w = np.array([1, 0, -1], dtype='float32')
    # 将权重参数调整成维度为[cout, cin, kh, kw]的四维张量
    w = w.reshape([1, 1, 1, 3])
    # 创建卷积算子,设置输出通道数,卷积核大小,和初始化权重参数
    # filter_size = [1, 3]表示kh = 1, kw=3
    # 创建卷积算子的时候,通过参数属性param_attr,指定参数初始化方式
    # 这里的初始化方式时,从numpy.ndarray初始化卷积参数
    conv = Conv2D('conv', num_filters=1, filter_size=[1, 3],
            param_attr=fluid.ParamAttr(
              initializer=NumpyArrayInitializer(value=w)))

    # 创建输入图片,图片左边的像素点取值为1,右边的像素点取值为0
    img = np.ones([50,50], dtype='float32')
    img[:, 30:] = 0.
    # 将图片形状调整为[N, C, H, W]的形式
    x = img.reshape([1,1,50,50])
    # 将numpy.ndarray转化成paddle中的tensor
    x = fluid.dygraph.to_variable(x)
    # 使用卷积算子作用在输入图片上
    y = conv(x)
    # 将输出tensor转化为numpy.ndarray
    out = y.numpy()

f = plt.subplot(121)
f.set_title('input image', fontsize=15)
plt.imshow(img, cmap='gray')

f = plt.subplot(122)
f.set_title('output featuremap', fontsize=15)
# 卷积算子Conv2D输出数据形状为[N, C, H, W]形式
# 此处N, C=1,输出数据形状为[1, 1, H, W],是4维数组
# 但是画图函数plt.imshow画灰度图时,只接受2维数组
# 通过numpy.squeeze函数将大小为1的维度消除
plt.imshow(out.squeeze(), cmap='gray')
plt.show()

案例2——图像中物体边缘检测

上面展示的是一个人为构造出来的简单图片使用卷积检测明暗分界处的例子,对于真实的图片,也可以使用合适的卷积核对它进行操作,用来检测物体的外形轮廓,观察输出特征图跟原图之间的对应关系,如下代码所示:

import matplotlib.pyplot as pltfrom PIL import Imageimport numpy as npimport paddleimport paddle.fluid as fluidfrom paddle.fluid.dygraph.nn import Conv2Dfrom paddle.fluid.initializer import NumpyArrayInitializerimg = Image.open('./work/images/section1/000000098520.jpg')with fluid.dygraph.guard():    # 设置卷积核参数    w = np.array([[-1,-1,-1], [-1,8,-1], [-1,-1,-1]], dtype='float32')/8    w = w.reshape([1, 1, 3, 3])    # 由于输入通道数是3,将卷积核的形状从[1,1,3,3]调整为[1,3,3,3]    w = np.repeat(w, 3, axis=1)    # 创建卷积算子,输出通道数为1,卷积核大小为3x3,    # 并使用上面的设置好的数值作为卷积核权重的初始化参数    conv = Conv2D('conv', num_filters=1, filter_size=[3, 3],             param_attr=fluid.ParamAttr(              initializer=NumpyArrayInitializer(value=w)))    # 将读入的图片转化为float32类型的numpy.ndarray    x = np.array(img).astype('float32')    # 图片读入成ndarry时,形状是[H, W, 3],    # 将通道这一维度调整到最前面    x = np.transpose(x, (2,0,1))    # 将数据形状调整为[N, C, H, W]格式    x = x.reshape(1, 3, img.height, img.width)    x = fluid.dygraph.to_variable(x)    y = conv(x)    out = y.numpy()plt.figure(figsize=(20, 10))f = plt.subplot(121)f.set_title('input image', fontsize=15)plt.imshow(img)f = plt.subplot(122)f.set_title('output feature map', fontsize=15)plt.imshow(out.squeeze(), cmap='gray')plt.show()

案例3——图像均值模糊

另外一种比较常见的卷积核是用当前像素跟它邻域内的像素取平均,这样可以使图像上噪声比较大的点变得更平滑,如下代码所示:

import matplotlib.pyplot as plt

from PIL import Image

import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph.nn import Conv2D
from paddle.fluid.initializer import NumpyArrayInitializer

# 读入图片并转成numpy.ndarray
#img = Image.open('./images/section1/000000001584.jpg')
img = Image.open('./work/images/section1/000000355610.jpg').convert('L')
img = np.array(img)

# 换成灰度图

with fluid.dygraph.guard():
    # 创建初始化参数
    w = np.ones([1, 1, 5, 5], dtype = 'float32')/25
    conv = Conv2D('conv', num_filters=1, filter_size=[5, 5], 
            param_attr=fluid.ParamAttr(
              initializer=NumpyArrayInitializer(value=w)))

    x = img.astype('float32')
    x = x.reshape(1,1,img.shape[0], img.shape[1])
    x = fluid.dygraph.to_variable(x)
    y = conv(x)
    out = y.numpy()

plt.figure(figsize=(20, 12))
f = plt.subplot(121)
f.set_title('input image')
plt.imshow(img, cmap='gray')

f = plt.subplot(122)
f.set_title('output feature map')
out = out.squeeze()
plt.imshow(out, cmap='gray')

plt.show()

总结

本文中孙老师先给大家讲述了计算机视觉的发展历程及目前的具体应用场景,研究方法如何从传统图像算法过渡到深度学习,以及深度学习中最流行的卷积神经网络。在接下来的文章中,我们将重点展开讲解卷积神经网络里面的一些常用模块,如卷积、池化、ReLU激活函数等。在后期课程中,将继续为大家带来内容更丰富的课程,帮助学员快速掌握深度学习方法。

【如何学习】

如何观看配套视频?如何代码实践?

视频+代码已经发布在AI Studio实践平台上,视频支持PC端/手机端同步观看,也鼓励大家亲手体验运行代码哦。扫码或者打开以下链接:
https://aistudio.baidu.com/aistudio/course/introduce/888

学习过程中,有疑问怎么办?

加入深度学习集训营QQ群:726887660,班主任与飞桨研发会在群里进行答疑与学习资料发放。

如何学习更多内容?

Baidu will be in the form of fly fly paddle paddle depth study of training camp, continue to update the "zero-based entry-depth study" courses, learning senior research engineer personally taught by the depth of Baidu, Tuesdays, Thursdays 8: 00-9: 00 is not seen scattered, using taped live + + + practice in the form of Q & a's, welcome attention ~

Published 116 original articles · won praise 1 · views 4570

Guess you like

Origin blog.csdn.net/PaddleLover/article/details/103897720