前言

首先回顾一下CNN的基础知识：

“物所看到的景象并非世界的原貌，而是长期进化出来的适合自己生存环境的一种感知方式。画面识别实际上是寻找（学习）人类的视觉关联方式，并再次应用。”

在计算机中，图片存储为0-255的数字，0最暗，255最亮。彩色图片有三个通道，RGB（红、绿、蓝），三原色叠加成为不同颜色的图片，计算机中，用三维立方体表示。

CNN具有的特性：

1）位置不变性：物体无论在一张图片的什么位置都是同一个物体，卷积网络对此具有不变性。为什么不可以用前馈网络实现？前馈网络首先将三维数据拉成一维输入网络，经过一系列隐藏层输出最后的节点，这个一维数据在每个网络节点对应不同权重，如果改变物体顺序结果一定是不一样的。但是CNN具有权重共享的特性，不同位置具有相同的权重，因此输出具有位置不变性。
2）局部连接：选择一个局部区域（卷积核fliter），去扫描这个三维张量，这个扫描的局部区域的全部节点一起连接到下一个点。
这样操作可以减少参数数量，因为可以不用一个节点分配一个权重，只要局部分配一个权重。
3）空间共享：与全连接不同，CNN每层输出节点并不是和全部输入相连，是部分连接，卷积核每次往前推进stride。

卷积计算：

1)通道C方向权重不共享: 彩色图片为三维，在图片的长宽方向是局部连接，通道方向为全部连接，2D卷积过程中，WHC张量映射成平面上的一个点，这里注意在通道C方向权重不共享，权重会扩充到C组。
2)补0操作：为了保留图片边缘信息；同时防止图片越“卷”越小。同时为什么经常选取33和55的卷积核？卷积核为3补上1个zero会生成与原特征图相同的尺寸，卷积核为5补上2个zero会生成与原特征图相同的尺寸。

尺寸计算公式：(input_size + 2*padding - kernel_size)/stride + 1
*权重个数计算公式：kernel_size * kernel_size * kernel_numbers C(c为通道数)

3)多个卷积核叠加：用特定的卷积核去抓取特定的存在，例如：用不同的kernal去提取不同的feature_map,最后按顺序堆叠起来成为一个三维张量，再输入到下一个卷积操作里面。
4）非线性映射：和前馈网络一样，加入非线性变换增加网络拟合能力。
5）池化：下采样，因为识别出的特征图存在冗余。maxpooling:取一个卷积核区域最大数值，保留纹理特征；平均池化：取一个卷积核区域平均值，保留平均值；全局池化：取每个通道的平均，常用来代替全连接，防止因参数过多造成过拟合。

扫描二维码关注公众号，回复： 14836276 查看本文章

总结
CNN不变性的实现方法：1、平移不变性：空间和参数共享；2、旋转不变性：大量数据；3、尺寸不变性：Inception模块，用多种尺寸的卷积核输出后进行concate。
残差：不同层级之间进行信息交互。

一、CNN构成

首先需要明确构成：
1）Zero Padding补0
2）Kernal卷积核
3）Pooling池化
4）Convolution forward 前向传播
5）Convolution backward 后向传播

二、三通道cnn代码构建

1、补0

def zero_pad(X, pad):
    """
    ‘constant’——表示连续填充相同的值，每个轴可以分别指定填充值，constant_values=（x, y）时前面用x填充，后面用y填充，缺省值填充0                X的shape (m, n_H, n_W, n_C)表示（图片数量，高，宽，通道）
    pad --在n_H和n_W的边缘补0个数
    返回X_pad -- 补0之后的shape (m, n_H + 2*pad, n_W + 2*pad, n_C）
    """
    X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0)
    return X_pad

2、单步卷积

def conv_single_step(a_slice_prev, W, b):
    """
    Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation of the previous layer.
    Arguments:
    a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
    W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
    b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
    Returns:
    Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
    """
    s = np.multiply(a_slice_prev,W)
    Z = np.sum(s)
    Z = Z + float(b)         #将b转化为浮点数，将Z转成标量.
    return Z

3、conv_forward函数卷积

def conv_forward(A_prev, W, b, hparameters):
    """
    Implements the forward propagation for a convolution function
    Arguments:
    A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
    b -- Biases, numpy array of shape (1, 1, 1, n_C)
    hparameters -- python dictionary containing "stride" and "pad"
    Returns:
    Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward() function
    """
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape
    # Retrieve information from "hparameters" 
    stride = hparameters['stride']
    pad = hparameters['pad']
    # Compute the dimensions of the CONV output volume using the formula given above.
    n_H = int((n_H_prev - f +2 * pad)/stride) +1
    n_W = int((n_W_prev - f +2 * pad)/stride) +1
    # Initialize the output volume Z with zeros.
    Z = np.zeros((m , n_H, n_W, n_C))
    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev, pad)
    for i in range(m):                               # loop over the batch of training examples
        a_prev_pad = A_prev_pad[i]                              # Select ith training example's padded activation
        for h in range(n_H):                           # loop over vertical axis of the output volume
            for w in range(n_W):                       # loop over horizontal axis of the output volume
                for c in range(n_C):                   # loop over channels (= #filters) of the output volume
                    # Find the corners of the current "slice"
                    vert_start = h*stride
                    vert_end = vert_start + f
                    horiz_start =w*stride
                    horiz_end = horiz_start + f
                    # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
                    a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, W[...,c], b[...,c])
    # Making sure your output shape is correct
    assert(Z.shape == (m, n_H, n_W, n_C))
    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)
    return Z, cache

三、二维cnn代码构建

核心代码

def convolution(k, data):
    n,m = data.shape
    img_new = []
    for i in range(n-2):
        line = []
        for j in range(m-2):
            a = data[i:i+3,j:j+3]
            line.append(np.sum(np.multiply(k, a)))
        img_new.append(line)
    return np.array(img_new)
## 卷积核1：垂直边缘检测
k1 = np.array([
    [1,0,-1],
    [1,0,-1],
    [1,0,-1]
])
##卷积核2：水平边缘检测
k2 = np.array([
    [1,1,1],
    [0,0,0],
    [-1,-1,-1]
])

c++实现二维卷积

//c++, arr为原始图像， filter为卷积和，输出为unsigned char类型；
#include<iostream>
#include<algorithm>
#include<vector>
using namespace std;
int main() {
    int M, N;
    int tmparr;
    cin >> M >> N;
    vector<vector<int> > array(N, vector<int>(M, 0));
    for(int i = 0; i < N; ++i) {
        for(int j = 0; j < M; ++j) {
            cin >> tmparr;
            array[i][j] = tmparr;
        }
    }
    cout << "array end" << endl;
    int W, H;
    double tmpfilter;
    cin >> W >> H;
    vector<vector<double> > filter(H, vector<double>(W, 0));
    for(int i = 0; i < H; ++i) {
        for(int j = 0; j < W; ++j) {
            cin >> tmpfilter;
            filter[i][j] = tmpfilter;
        }
    }
    cout << "filter end" << endl;
    //vector<vector<unsigned char> > res(N, vector<unsigned char>(M, 0));
    double tmp;
    int top = -(H-1)/2;
    int left = -(W-1)/2;
    for(int i = 0; i < N; ++i) {
        for(int j = 0; j < M; ++j) {
            tmp = 0;
            int boxtop = i + top;
            int boxleft = j + left;
            for(int k = 0; k < H; ++k) {
                for(int l = 0; l < W; ++l) {
                    int tmpboxtop = boxtop + k;
                    int tmpboxleft = boxleft + l;
                    if (tmpboxtop < 0) tmpboxtop = -tmpboxtop;
                    if (tmpboxtop >= N) tmpboxtop = 2*N - 2 - tmpboxtop;
                    if (tmpboxleft < 0) tmpboxleft = - tmpboxleft;
                    if (tmpboxleft >= M) tmpboxleft = 2*M - 2 - tmpboxleft;
                    //cout << "tmpboxtop = " << tmpboxtop << endl;
                    //cout << "tmpboxleft = " << tmpboxleft << endl;
                    //cout << "k = " << k << endl;
                    //cout << "l = " << l << endl;
                    //cout << "array[tmpboxtop][tmpboxleft] = " << array[tmpboxtop][tmpboxleft] << endl;
                    //cout << "filter[k][l] = " << filter[k][l] << endl;
                    tmp += array[tmpboxtop][tmpboxleft] * filter[k][l];
                }
            }
            //res[i][j] = (unsigned char)tmp;
            cout << tmp << " ";
        }
        cout << endl;
    }
    system("pause");
}

Maxpooling

import numpy as np

def max_pooling(feature_map, size=2, stride=2):
   
    #feature_map （h,w）
    height = feature_map.shape[0]
    width = feature_map.shape[1]
    # 确定最后的输出形状
    out_height = np.uint16((height - size) // stride + 1)
    out_width = np.uint16((width - size) // stride + 1)
    # print "out_shape", (out_height, out_width)

    out_pooling = np.zeros((out_height, out_width), dtype=np.uint8)
    x = y = 0
    for m in np.arange(0, height, stride):
        for n in np.arange(0, width, stride):
            try:
                out_pooling[x][y] = np.max(feature_map[m:m + size, n:n + size])
                y += stride
            # try执行不成功, 说明已经超出,终止循环
            except:
                break
        x += stride
        y = 0
    return out_pooling
    
if __name__ == "__main__":
    input = np.arange(9).reshape((3, 3))
    output = max_pooling(input, 2, 1)
    print(output)

softmax实现

import numpy as np
def softmax(a):
    exp_a = np.exp(a)
    sum_exp_a = np.sum(exp_a)
    y = exp_a/sum_exp_a
    return y

手动实现卷积函数python

文章目录

前言