Implementation of Convolutional Layer and Pooling Layer
As mentioned earlier, the data passed between layers in CNN is 4-dimensional data. The so-called 4-dimensional data, for example, if the shape of the data is (10, 1, 28, 28), it corresponds to 10 data whose height is 28, length is 28, and channel is 1. If it is implemented in Python, it is as follows
>>> x = np.random.rand(10, 1, 28, 28) # 随机生成数据
>>> x.shape
(10, 1, 28, 28)
Here, if you want to access the first data, just write x[0]
If you want to access the spatial data of the first channel of the first data, you can write it as follows.
>>> x[0, 0] # 或者x[0][0]
Like this, CNN deals with 4-dimensional data, so the implementation of convolution operation may seem complicated, but by using the im2col technique described below, the problem will become very simple.
Expansion based on im2col
If the convolution operation is implemented honestly, it is estimated that several layers of for statements will be repeated. Such an implementation is a bit cumbersome. The for statement is not used here, but the convenient function im2col (image to column) is used for simple implementation.
im2col is a function that expands the input data to fit a filter (weights).
As shown in the figure below, after im2col is applied to the 3-dimensional input data , the data is converted into a 2-dimensional matrix (correctly, the 4-dimensional data including the batch number is converted into 2-dimensional data).
Specifically, as shown in the figure below, for the input data, the area (3-dimensional square) to which the filter is applied is expanded horizontally into 1 column. im2col will do this unwrapping everywhere a filter is applied.
In the image above, the stride is set to be large so that the applied regions of the filters do not overlap for the sake of observation. In the actual convolution operation, the application areas of the filters almost overlap . In the case where the application areas of the filters overlap, after using im2col to expand, the number of elements after expansion will be more than the number of elements in the original square. Therefore, the implementation using im2col has the disadvantage of consuming more memory than the normal implementation . However, summarizing them into a large matrix for calculation is quite beneficial for computer calculations. For example, in the library of matrix calculation (linear algebra library), etc., the implementation of matrix calculation has been highly optimized, and the multiplication of large matrices can be performed at high speed . Thus, linear algebra libraries can be effectively utilized by reducing to matrix calculations.
As shown in the figure below, the output based on the im2col method is a 2-dimensional matrix. Because the data in the CNN will be saved as a 4-dimensional array, it is necessary to convert the 2-dimensional output data into a suitable shape. The above is the implementation process of the convolutional layer.
The details of the filter processing of the convolution operation: expand the filter vertically into 1 column, calculate the matrix product with the data expanded by im2col, and finally convert (reshape) to the size of the output data
Implementation of the convolutional layer
The implementation of im2col is as follows
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
"""
Parameters
----------
input_data : 由(数据量, 通道, 高, 长)的4维数组构成的输入数据
filter_h : 滤波器的高
filter_w : 滤波器的长
stride : 步幅
pad : 填充
Returns
-------
col : 2维数组
"""
N, C, H, W = input_data.shape
out_h = (H + 2*pad - filter_h)//stride + 1
out_w = (W + 2*pad - filter_w)//stride + 1
img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))
for y in range(filter_h):
y_max = y + stride*out_h
for x in range(filter_w):
x_max = x + stride*out_w
col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
return col
input_data——Input data composed of a 4-dimensional array of (data volume, channel, height, length)
filter_h——The height of the filter
filter_w——The length of the filter
stride——Stride
pad——Fill
The following uses im2col to implement the convolutional layer, where the convolutional layer is implemented as a class named Convolution
class Convolution:
def __init__(self, W, b, stride=1,pad=0):
self.W = W
self.b = b
self.stride = stride
self.pad = pad
def forward(self, x):
FN, C, FH, FW = self.W.shape
N, C, H, W = x.shape
out_h = init(1+ (H + 2*self.pad - FH) / self.stride)
out_w = init(1+ (W + 2*self.pad - FW) / self.stride)
col = im2col(x, FH, FW, self.stride, self.pad)
col_W = self.W.reshape(FN, -1).T # 滤波器的展开
out = np.dot(col, col_W) + self.b
out = out.reshape(N,out_h,out_w, -1).transpose(0, 3, 1, 2)
return out
The number of elements in the degree, so that the number of elements of the multidimensional array is consistent. For example, the number of elements in an array of shape (10, 3, 5, 5) is 750. After specifying reshape(10,-1), it will be converted into an array of shape (10, 75).
Expand the filter section to expand the squares of each filter vertically into one column. Here, the parameter is specified as -1 by reshape(FN,-1), which is a convenient function of reshape. By specifying -1 when reshaping, the reshape function will automatically calculate the number of elements on the -1 dimension, so that the number of elements in the multidimensional array is consistent . For example, the number of elements in the (10, 3, 5, 5) shape array is 750. After specifying reshape(10,-1), it will be converted into a (10, 75) shape array
Implementation of the pooling layer The
pooling layer also uses im2col to expand the input data
. However, in the case of pooling, it is independent in the direction of the channel , which is different from the convolutional layer. Specifically, the application area of pooling is expanded by channels individually .
As shown in the figure below:
After expanding like this, you only need to find the maximum value of each row of the expanded matrix and convert it into a suitable shape. The
realization of the pooling layer is carried out in the following three stages
- Expand Enter Data.
- Find the maximum value for each row.
- Convert to a suitable output size.
Let's look at the actual implementation of Python:
class Pooling:
def __init__(self, pool_h, pool_w, stride=1, pad=0):
self.pool_h = pool_h
self.pool_w = pool_w
self.stride = stride
self.pad = pad
def forward(self, x):
N, C, H, W = x.shape
out_h = init(1+ (H - self.pool_h) / self.stride)
out_w = init(1+ (W - self.pool_w) / self.stride)
# 展开(1)
col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
col = col.reshape(-1, self.pool_h*pool_w)
# 最大值(2)
out = np.max(col, axis=1)
# 转换(3)
out = out.reshape(N, out_h, out_w, C).transpose(0,3,1,2)
return out