Pass 1: Implement backpropagation for convolutional layers
mission details
The task of this level: realize the backpropagation of the convolutional layer.
related information
In order to complete the task of this level, you need to master: the backpropagation of the convolutional layer.
For the content of this training, please refer to Chapter 5 of the book "Introduction to Deep Learning - Theory and Implementation Based on Python".
Backpropagation of Convolutional Layers
In the previous training, we learned the forward pass of the convolutional layer. We know that the forward propagation of the convolutional layer usually converts the input feature map into a matrix through the im2col operation, where each row of the matrix corresponds to the data of the input feature map in all channels of a convolution window, so that the convolution The operation is transformed into a matrix multiplication. Ignoring the calculation of the bias (because the calculation of the bias is very simple), the convolution calculation can be expressed by the following formula:
y^=x^×W^
Here, x^ is the result of input x calculated by im2col. Because the shape of the weight W of the convolutional layer is usually (C′,C,Kh,Kw), in order to match the form of matrix multiplication above, it is necessary to convert W into (C⋅Kh⋅Kw through reshape and transpose operations ,C′), here we record this matrix as W^. The shape of the obtained result y^ is (B⋅H′⋅W′,C′), which is transformed into (B,C′,H′,W′) through reshape and transpose operations, which is the result we want .
Do you feel a little familiar with this form? That's right, this is exactly the same as the calculation form of the fully connected layer we learned before, applying the backpropagation formula of the fully connected layer:
∂W^∂l∂x^∂l=∂y^∂l⋅∂W^∂y^=x^×(∂y^∂l)T=∂y^∂l⋅∂x^∂y^=W^×∂y^∂l
Finally, all we need to do is transform ∂l/∂x^ into ∂l/∂x, and at the same time transform ∂l/∂W^ into ∂l/∂W. For the latter, just restore to the original shape through transpose and reshape operations. For the former, we need to reduce it to ∂l/∂x by the inverse of im2col—col2im.
Implementation of Convolutional Layer Backpropagation
The training expands the class defined in the previous training Convolution
. The training has already given forward(x)
the implementation, and it has been modified to meet the needs of backpropagation. You need to implement the backpropagation function of this class backward(dout)
, dout
which is the gradient of the loss function relative to the output of the convolutional layer, that is, ∂y∂l in the previous formula, which is a shape of (B,C′,H′,W′) numpy.ndarray
. The training has provided an col2im
implementation that can convert a (B×H′×W′, C×Kh×Kw) matrix into a (B, C, H, W) feature map.
col2im
The definitions of the functions provided by the training are as follows: col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0)
, and the meanings of the corresponding parameters are:
col
: transformed matrix;input_shape
: the shape of the input feature map;filter_h
andfilter_w
: the width and height of the pooling window;stride
: Pooling step size;pad
: Pooled padding.
programming requirements
According to the prompt, add the code between Begin and End of the editor on the right to realize the forward propagation of the above convolutional layer.
Test instruction
The platform will test the code you write. The test method is: the platform will randomly generate the input x
, weight W
, bias b
and output gradient dout
, and then create an instance of the class according to your implementation code Convolution
, and then use the instance to perform forward propagation first. Calculate, and then perform backpropagation calculation. Your answers will be compared with standard answers. Because the calculation of floating point numbers may have errors, as long as the error between your answer and the standard answer does not exceed 10−5.
Sample input:
x:
[[[[0.11 0.65 0.08 0.28 0.77]
[0.12 0.29 0.77 0.58 0.11]
[0.44 0.43 0.4 0.28 0.87]
[0.52 0.78 0.99 0.58 0.13]
[0.46 0.31 0.9 0.19 0.3 ]]
[[0.12 0.65 0.58 0.44 0.9 ]
[0.23 0.34 0.96 0.94 0.37]
[0.63 0.17 0.61 0.22 0.98]
[0.91 0.32 0.78 0.61 0.42]
[0.21 0.01 0.9 0.06 0.2 ]]]
[[[0.79 0.36 0.39 0.6 0.12]
[0.84 0.62 0.32 0.84 0.69]
[0.34 0.21 0.03 0.17 0.22]
[0.6 0.19 0.4 1. 0.93]
[0.46 0.54 0.62 0.8 0.81]]
[[0.12 0.7 0.88 0.53 0.79]
[0.36 0.14 0.87 0.99 0.27]
[0.35 0.87 0.25 0.57 0.3 ]
[0.92 0.23 0.84 0.72 0.9 ]
[0.05 0.91 0.61 0.23 0.56]]]]
W:
[[[[-0.77 0.09 -0.46]
[ 0.42 0.89 0.97]
[ 0.41 0.52 2.57]]
[[-0.75 0.75 -0.38]
[-0.78 -1.72 0.99]
[ 1.24 -0.68 -1.2 ]]]
[[[-0.44 0.3 -0.01]
[-0.03 -0.75 1.33]
[-0.13 1.63 0.02]]
[[-1.11 -1.18 -1.46]
[ 0.76 -1.01 -1.38]
[ 0.88 0.43 -1.41]]]]
b:
[0.94 1.15]
dout:
[[[[0.12 0.09 0.12 0.87 0.02]
[0.21 0.26 0.51 0.61 0.34]
[0.04 0.59 0.69 0.45 0.06]
[0.14 0.56 0.18 0.17 0.47]
[0.69 0.18 0.73 0.4 0.79]]
[[0.8 0.62 0.72 0.45 0.96]
[0.34 0.7 0.15 0.44 0.39]
[0.2 0.97 0.8 0.36 0.93]
[0.93 0.3 0.63 0.13 0.12]
[0.94 0.77 0.18 0.78 0.15]]]
[[[0.35 0.72 0.24 0.56 0.09]
[0.3 0.35 0.56 0.99 0.73]
[0.98 0.96 0.23 0.9 0.11]
[0.13 0.84 0.74 0.83 0.78]
[0.12 0.89 0.44 0.1 0.05]]
[[0.43 0.14 0.4 0.3 0.26]
[0.3 0.19 0.48 0.93 0.53]
[0.06 0.7 0.6 0.6 0.45]
[0.45 0.34 0.53 0.78 0.98]
[0.09 0.4 0.66 0.48 0.97]]]]
stride: 1
pad: 1
Then the corresponding gradient is:
dW:
[[[[ 7.13 8.73 7.06]
[ 8.88 10.84 9.14]
[ 7.69 9.52 7.76]]
[[ 9.1 10.96 9.22]
[10.08 12.4 10.84]
[ 8.37 9.58 8.4 ]]]
[[[ 7.85 9.8 7.89]
[10.09 13.15 10.29]
[ 8.24 10.3 8.54]]
[[ 9.9 12.03 9.94]
[11.2 14.32 10.46]
[ 9.18 10.2 8.73]]]]
db:
[22.28 25.81]
dx:
[[[[-0.86 0.5 0.13 1.01 0.6 ]
[ 0.53 1.35 3.06 1.8 5.08]
[ 0.51 2.03 3.2 3.85 2.66]
[-0.02 3.16 3.41 3.03 3.3 ]
[ 1.74 2.95 4.48 2. 2.9 ]]
[[-1.83 -3.07 -3.9 -3.27 -1.84]
[-1.18 -4.4 -3.96 -4.45 -4.01]
[-0.8 -4.18 -4.8 -2.68 -2.55]
[-1.33 -3.98 -4.19 -4.52 -2.5 ]
[-0.43 -3.09 -2.66 -1.98 -2.85]]]
[[[ 0.05 0.84 -0.12 0.38 0.59]
[ 0.4 2.04 2.73 2.93 4.09]
[ 1.32 2.18 3.09 4.05 5.21]
[ 0.21 5.2 5.26 3.39 5.16]
[ 1.5 2.46 4.97 4.83 4.2 ]]
[[-2.09 -2.98 -2.84 -3.86 -2.09]
[-0.82 -3.19 -4.79 -5.26 -4.81]
[-2.88 -2.64 -2.45 -5.77 -5.37]
[-1.04 -4.3 -3.87 -5.76 -6.24]
[ 0.76 -1.61 -0.85 -0.47 -3.83]]]]
The above results have rounding errors, which you can ignore.
Let's start your mission, I wish you success!
Implementation code:
import numpy as np
from utils import im2col, col2im
class Convolution:
def __init__(self, W, b, stride=1, pad=0):
r'''
Initialization of the convolutional layer
Parameter:
- W: numpy.array, (C_out, C_in, K_h, K_w)
- b: numpy.array, (C_out)
- stride: int
- pad: int
'''
self.W = W
self.b = b
self.stride = stride
self.pad = pad
self.x = None
self.col = None
self.col_W = None
self.dW = None
self.db = None
def forward(self, x):
r'''
Forward propagation of convolutional layers
Parameter:
- x: numpy.array, (B, C, H, W)
Return:
- y: numpy.array, (B, C', H', W')
H' = (H - Kh + 2P) / S + 1
W' = (W - Kw + 2P) / S + 1
'''
FN, C, FH, FW = self.W.shape
N, C, H, W = x.shape
out_h = 1 + int((H + 2 * self.pad - FH) / self.stride)
out_w = 1 + int((W + 2 * self.pad - FW) / self.stride)
col = im2col(x, FH, FW, self.stride, self.pad)
col_W = self.W.reshape(FN, -1).T
out = np.dot(col, col_W) + self.b
out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
self.x = x
self.col = col
self.col_W = col_W
return out
def backward(self, dout):
r'''
Backpropagation of Convolutional Layers
Parameter:
- dout: numpy.array, (B, C', H', W')
Return:
- dx: numpy.array, (B, C, H, W)
In addition, the following results need to be calculated:
- self.dW: numpy.array, (C', C, Kh, Kw) same shape as self.W
- self.db: numpy.array, (C',) same shape as self.b
'''
########## Begin ##########
FN, C, FH, FW = self.W.shape
dead = dead.transpose(0,2,3,1).reshape(-1, FN)
self.db = np.sum(dout, axis=0)
self.dW = np.dot(self.col.T, dout)
self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)
dcol = np.dot(dout, self.col_W.T)
dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)
return dx
########## End ##########
Pass 2: Implement backpropagation of the pooling layer
mission details
The task of this level: realize the backpropagation of the pooling layer.
related information
In order to complete this task, you need to master:
- Backpropagation of the pooling layer;
- Implementation of pooling layer backpropagation.
For the content of this training, please refer to Chapter 7 of the book "Introduction to Deep Learning - Theory and Implementation Based on Python".
Backpropagation of the pooling layer
In the previous training, we learned the forward propagation of the pooling layer, and we also know that the forward propagation of the upper pooling layer and the forward propagation of the convolutional layer are very similar in principle and implementation. . The same is still true for the two in backpropagation, and all calculations are based on the window as the processing unit.
For average pooling, backpropagation is very simple, as long as the gradient corresponding to the output position is averaged back to each input position in the window. For maximum pooling, the max function we learned before has non-conductive points, so we use the subgradient according to the previous ReLU method, that is, in each window, the gradient of the output position is the largest when it is returned to the forward propagation The input location to which the value corresponds. The figure below shows the process of maximum pooling backpropagation.
Figure 1. Forward and backward propagation of maximum pooling
Implementation of pooling layer backpropagation
When implementing the forward propagation of the pooling layer, we adopted a technique similar to that used when implementing the convolutional layer, that is, first use im2col to convert the input feature map into a matrix, and the data in each window corresponds to the matrix In this way, the average operation or maximum operation in the window becomes a row operation of the matrix, so that numpy can be used for efficient operations. Therefore, when performing backpropagation, it is also similar to the convolutional layer, and then after backpropagating the transformed matrix at the average or maximum value, use the col2im operation to restore it to the same shape as the input feature map during forward propagation .
The training expands the class defined in the previous training MaxPool
. The training has already given forward(x)
the implementation, and it has been modified to meet the needs of backpropagation. You need to implement the backpropagation function of this class backward(dout)
, dout
which is the gradient of the loss function relative to the output of the fully connected layer, which is a shape of (B, C, H, W) numpy.ndarray
. During forward propagation, the input feature map and the index corresponding to the maximum value are recorded in self.x
and respectively self.arg_max
. The training has provided a col2im implementation, please refer to the introduction of the previous level.
programming requirements
According to the prompt, add the code between Begin and End of the editor on the right to realize the backpropagation of the above pooling layer.
Test instruction
The platform will test the code you write. The test method is: the platform will randomly generate input x
and output gradients , and then create an instance of a class dout
according to your implementation code, and then use this instance to perform forward propagation calculations first, and then perform reverse propagation calculations. MaxPool
to propagate calculations. Your answers will be compared with standard answers. Because the calculation of floating point numbers may have errors, as long as the error between your answer and the standard answer does not exceed 10−5.
Sample input:
x:
[[[[0.79 0.36 0.41 0.16]
[0.94 0.56 0.26 0.89]
[0.82 0.5 0.81 0.47]
[0.95 0.99 0.38 0.94]]
[[0.46 0.74 0.32 0.22]
[0.18 0.05 0.98 0.39]
[0.81 0.9 0.5 0.57]
[0.16 0.18 0.73 0.06]]]
[[[0.51 0.48 0.8 0.57]
[0.64 0.05 0.64 0.28]
[0.01 0.65 0.03 0.21]
[0.24 0.3 0.72 0.89]]
[[0.14 0.43 0.98 0.6 ]
[0.79 0.42 0.42 0.39]
[0.04 0.29 0.55 0.32]
[0.38 0.26 0.39 0.97]]]]
dout:
[[[[0.41 0.71]
[0.28 0.74]]
[[0.39 0.43]
[0.58 0.25]]]
[[[0.07 0.55]
[0.4 0.15]]
[[0.42 0.83]
[0.59 0.38]]]]
Then the gradient of the corresponding input feature map is:
dx:
[[[[0. 0. 0. 0. ]
[0.41 0. 0. 0.71]
[0. 0. 0. 0. ]
[0. 0.28 0. 0.74]]
[[0. 0.39 0. 0. ]
[0. 0. 0.43 0. ]
[0. 0.58 0. 0. ]
[0. 0. 0.25 0. ]]]
[[[0. 0. 0.55 0. ]
[0.07 0. 0. 0. ]
[0. 0.4 0. 0. ]
[0. 0. 0. 0.15]]
[[0. 0. 0.83 0. ]
[0.42 0. 0. 0. ]
[0. 0. 0. 0. ]
[0.59 0. 0. 0.38]]]]
Let's start your mission, I wish you success!
Implementation code:
import numpy as np
from utils import im2col, col2im
class MaxPool:
def __init__(self, pool_h, pool_w, stride=1, pad=0):
r'''
Initialization of the pooling layer
Parameter:
- pool_h: int
- pool_h: int
- stride: int
- pad: int
'''
self.pool_h = pool_h
self.pool_w = pool_w
self.stride = stride
self.pad = pad
self.x = None
self.arg_max = None
def forward(self, x):
r'''
Forward propagation of the pooling layer
Parameter:
- x: numpy.array, (B, C, H, W)
Return:
- y: numpy.array, (B, C, H', W')
H' = (H - Kh + 2P) / S + 1
W' = (W - Kw + 2P) / S + 1
'''
N, C, H, W = x.shape
out_h = int(1 + (H - self.pool_h + 2 * self.pad) / self.stride)
out_w = int(1 + (W - self.pool_w + 2 * self.pad) / self.stride)
col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
col = col.reshape(-1, self.pool_h * self.pool_w)
arg_max = np.argmax(col, axis=1)
out = np.max(col, axis=1)
out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)
self.x = x
self.arg_max = arg_max
return out
def backward(self, dout):
r'''
Backpropagation of the pooling layer
Parameter:
- dout: numpy.array, (B, C', H', W')
Return:
- dx: numpy.array, (B, C, H, W)
'''
########## Begin ##########
dead = dead.transpose(0, 2, 3, 1)
pool_size = self.pool_h * self.pool_w
dmax = np.zeros((dout.size, pool_size))
dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = dout.flatten()
dmax = dmax.reshape(dout.shape + (pool_size,))
dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
dx = col2im(dcol, self.x.shape, self.pool_h, self.pool_w, self.stride, self.pad)
return dx
########## End ##########