图像的卷积和池化操作

离散域

卷积分为连续函数卷积及离散函数卷积，因为我们实际工作中多数情况下是数字化的场景，所以我们需要离散域的卷积操作。对于两个离散函数f和g，卷积运算是将连续函数的积分操作转换为等效求和：

卷积核

对于图像而言，它属于二维数据，那么它对应的就是2D函数，我们可以通过一个过滤器来过滤图像，这个过滤器即是卷积核。一般来说过滤器的每个维度可以包含2到5个元素，不同的过滤器有不同的处理效果。

对于图像来说，经过特定的卷积核处理后将得到与原来等效的图像，但却能够突出图像中的某些元素，比如线条和边缘，此外它还能隐藏图像中的某些元素。

图片卷积

我们定义Identity、Laplacian、Left Sobel、Upper Sobel、Blur`五个过滤器，都是3 x 3的卷积核，不同的卷积核将突出原始图像的不同特征属性。

Blur滤波器相当于计算3x3内邻居点的平均值。Identity滤波器只是按原样返回像素值。Laplacian滤波器是用于突出边缘的微分滤波器。Left Sobel滤波器用于检测水平边缘，而Upper Sobel滤波器用于检测垂直边缘。

kernels = OrderedDict({"Identity": [[0, 0, 0], [0., 1., 0.], [0., 0., 0.]],
                       "Laplacian": [[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]],
                       "Left Sobel": [[1., 0., -1.], [2., 0., -2.], [1., 0., -1.]],
                       "Upper Sobel": [[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]],
                       "Blur": [[1. / 16., 1. / 8., 1. / 16.], [1. / 8., 1. / 4., 1. / 8.],
                                [1. / 16., 1. / 8., 1. / 16.]]})


def apply3x3kernel(image, kernel):
    newimage = np.array(image)
    for m in range(1, image.shape[0] - 2):
        for n in range(1, image.shape[1] - 2):
            newelement = 0
            for i in range(0, 3):
                for j in range(0, 3):
                    newelement = newelement + image[m - 1 + i][n - 1 + j] * kernel[i][j]
            newimage[m][n] = newelement
    return newimage


arr = imageio.imread("data/dog.jpg")[:, :, 0].astype(np.float)

plt.figure(1)
j = 0
positions = [321, 322, 323, 324, 325, 326]
for key, value in kernels.items():
    plt.subplot(positions[j])
    out = apply3x3kernel(arr, value)
    plt.imshow(out, cmap=plt.get_cmap('binary_r'))
    j = j + 1
plt.show()
复制代码

在生成的图表中，第一个子图像是未改变的图像，因为这里我们使用了Identity滤波器，接着的分别是Laplacian边缘检测、水平边缘检测和垂直边缘检测，最后是进行模糊运算。

图片池化

池化操作主要是通过一个核来达到减少参数的效果，比较有名的池化操作是最大值（最大值池化）、平均值（平均值池化）和最小值（最小值池化）。它能减少前面网络层进来的信息量，从而降低复杂度，同时保留最重要的信息元素。换句话说，它们构建了信息的紧凑表示。

def apply2x2pooling(image, stride):
    newimage = np.zeros((int(image.shape[0] / 2), int(image.shape[1] / 2)), np.float32)
    for m in range(1, image.shape[0] - 2, 2):
        for n in range(1, image.shape[1] - 2, 2):
            newimage[int(m / 2), int(n / 2)] = np.max(image[m:m + 2, n:n + 2])
    return (newimage)


arr = imageio.imread("data/dog.jpg")[:, :, 0].astype(np.float)
plt.figure(1)
plt.subplot(121)
plt.imshow(arr, cmap=plt.get_cmap('binary_r'))
out = apply2x2pooling(arr, 1)
plt.subplot(122)
plt.imshow(out, cmap=plt.get_cmap('binary_r'))
plt.show()
复制代码

可以看到图像池化前后的一些差异，最终生成的图像的分辨率较低，总体的像素数量大约为原来的四分之一。