cs231n_卷积神经网络

1. 为什么要有卷积神经网络？

线性神经网络，在处理图像的时候，会把图像展开成1d的vector，这个就破坏了图像本身的spatial的结构，因此conv的神经网络针对这个而设计。

卷积层的表现形式，和传统cv里的filter很像，特定的filter，提取特定的特征。

2. 卷积神经网络常用的组件

2.1 卷积层

conv layer，具有可以学习的权重。

输入：B x Cin x H x W

卷积：Cout x Cin x K x K

输出：B x Cout x H' x W'

H',W'，取决于kernnel的size，取决于原始的H，W，取决于padding，取决于滑动的stride

H' = (H + 2*P - K)/S + 1

参数量： Cout x (Cin x K x K + 1)

计算量： Cout x H' x W' x Cin x K x K

卷积之后生成的激活层的理解：

1. Cout维度，每一个特征图对应某种特定filter提取之后生成的特征图

2. 每一个grid，共有cout个维度，对应这个grid提取的所有特征

感受野，

从右往前看，如果堆叠3x3的kernel，每一个grid，对应的感受野分别是：
1 => 3 => 5 => 7

1 + L*(K-1)

https://arxiv.org/pdf/1705.07049.pdf

更通用的感受野计算的方式如下：

但是这种方法，感受野的增加太慢了，有没有其他加速增加感受野的方法么？

可以增加stride>1来提高速度。

pytorch中的2D卷积

2.2 Pooling层

下采样特征图；而且不需要有学习的参数；引进对小的移动的invariance；

max pooling，在某个kernel size中，提取最大的值作为代表。

average pooling，在某个kernel size中，提取平均值作为代表。

手写pooling layer

import numpy as np

def max_pooling(input_data, pool_size, strides):
    # 输入：bhwc,k,s
    # 输出：bh'w'c    
    batch_size, input_height, input_width, input_channels = input_data.shape
    pool_height, pool_width = pool_size
    stride_y, stride_x = strides
    # 根据常用的公式计算输出的尺寸，(h-k)/s + 1
    output_height = int((input_height - pool_height) / stride_y) + 1
    output_width = int((input_width - pool_width) / stride_x) + 1
    # 构建输出的特征图
    pooled_output = np.zeros((batch_size, output_height, output_width, input_channels))
    
    #遍历输出的特征的grid, 进行计算和赋值
    for i in range(output_height):
        for j in range(output_width):
            window = input_data[:, i * stride_y:i * stride_y + pool_height, j * stride_x:j * stride_x + pool_width, :]
            pooled_output[:, i, j, :] = np.amax(window, axis=(1, 2))
    return pooled_output

2.3 激活层

最常用的是relu

2.4 BN层

1D的batch normalization

加上scale 和 shift

训练和推理的不同

在训练的时候，用batch内的平均值，u和sigma，在训练过程中保存这些所有的平均值。

最后变成一个constant的量，直接用于推理中。

bn层在inference的时候，变成线性的变化，会融合得到前面的conv层。

2D的BN，

2.5 Layer Normalization

在特征上求均值和方差

3. 特殊卷积

3.1 dilation卷积

是一种增加感受野的方法，不改变kernel的尺寸，而是在kernel中间插入空洞来实现kernel的尺寸的增长，从而增加感受野。

Some advantages of dilated convolutions are:

Increased receptive field without increasing parameters
Can capture features at multiple scales
Reduced spatial resolution loss compared to regular convolutions with larger filters

Some disadvantages of dilated convolutions are:

Reduced spatial resolution in the output feature map compared to the input feature map
Increased computational cost compared to regular convolutions with the same filter size and stride