Table of contents
1. Configure the virtual environment
2. Library version introduction
1. Two-dimensional cross-correlation operation (corr2d)
2. Two-dimensional convolution layer class (Conv2D)
b. forward (forward propagation function)
1. Experiment introduction
This experiment implemented a simple two-dimensional convolutional neural network , including a two-dimensional cross-correlation operation function and a custom two-dimensional convolution layer class, and performed a convolution operation on a randomly generated two-dimensional tensor.
2. Experimental environment
This series of experiments uses the PyTorch deep learning framework. The relevant operations are as follows:
1. Configure the virtual environment
conda create -n DL python=3.7
conda activate DL
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
conda install matplotlib
conda install scikit-learn
2. Library version introduction
software package | This experimental version | The latest version currently |
matplotlib | 3.5.3 | 3.8.0 |
numpy | 1.21.6 | 1.26.0 |
python | 3.7.16 | |
scikit-learn | 0.22.1 | 1.3.0 |
torch | 1.8.1+cu102 | 2.0.1 |
torchaudio | 0.8.1 | 2.0.2 |
torchvision | 0.9.1+cu102 | 0.15.2 |
3. Experimental content
ChatGPT:
Convolutional Neural Network (CNN) is a deep learning model that is widely used in image recognition, computer vision, pattern recognition and other fields. Its design is inspired by how the visual cortex works in biology.
The convolutional neural network consists of multiple convolutional layers, pooling layers and fully connected layers .
- The convolution layer is mainly used to extract local features of the image. Through the processing of convolution operations and activation functions, the feature representation of the image can be learned.
- The pooling layer is used to reduce the dimension of the feature map and reduce the number of parameters while retaining the main feature information.
- The fully connected layer is used to map the extracted features to the probabilities of different categories for classification or regression tasks.
Convolutional neural networks have strong advantages in image processing. They can automatically learn feature representations with hierarchical structures and have certain invariance to image transformations such as translation, scaling, and rotation . These characteristics make convolutional neural networks the model of choice for tasks such as image classification, target detection, and semantic segmentation. In addition to image processing, convolutional neural networks can also be applied to other fields, such as natural language processing and time series analysis. By converting text or time series data into a two-dimensional form, convolutional neural networks can be used to process related tasks.
0. Import necessary toolkits
import torch
from torch import nn
import torch.nn.functional as F
- torch.nn: The neural network module in PyTorch provides various neural network layers and functions.
- torch.nn.functional: Functional neural network layers in PyTorch, such as activation functions and loss functions.
1. Two-dimensional cross-correlation operation (corr2d)
As shown before, in the process of calculating convolution, the convolution kernel needs to be flipped . In terms of specific implementation, cross-correlation operations are generally used instead of convolutions, thereby reducing some unnecessary operations or overhead.
- Flip refers to reversing the order in two dimensions (top to bottom, left to right), that is, rotating 180 degrees.
- The difference between cross-correlation and convolution is only whether the convolution kernel is flipped . Therefore cross-correlation can also be called non-flip convolution .
Convolution is used in neural networks for feature extraction . Whether the convolution kernel is flipped has nothing to do with its feature extraction capability . Especially when the convolution kernel is a learnable parameter, convolution and cross-correlation are equivalent in capability. Therefore, for the sake of implementation (or description) convenience, we use cross-correlation instead of convolution. In fact, the convolution operations in many deep learning tools are actually cross-correlation operations.
def corr2d(X, K):
h, w = K.shape
Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
Y[i, j] = (X[i:i + h, j:j + w] * K).sum()
return Y
- Input: input tensor X and convolution kernel tensor K.
- Output: Cross-correlation operation result tensor Y, with shape (X.shape[0] - K.shape[0] + 1, X.shape[1] - K.shape[1] + 1).
- Each element of the output tensor Y is traversed through two nested loops, and the cross-correlation operation result is calculated using local multiplication and summation.
2. Two-dimensional convolution layer class (Conv2D)
class Conv2D(nn.Module):
def __init__(self, kernel_size, weight=None):
super().__init__()
if weight is not None:
self.weight = weight
else:
self.weight = nn.Parameter(torch.rand(kernel_size))
self.bias = nn.Parameter(torch.zeros(1))
def forward(self, x):
return corr2d(x, self.weight) + self.bias
a. __init__ (initialization)
- Accepts one
kernel_size
argument as the size of the convolution kernel, and optionally oneweight
argument as the weight of the convolution kernel. - If no
weight
parameters are provided, a weight of the same shape is randomly generatedkernel_size
and set as a trainable parameter (nn.Parameter
). - A bias term is defined
bias
and set as a trainable parameter.
b.
forward (
forward propagation function)
Call the previous corr2d
function to calculate the correlation between the input x
and the convolution kernel weight , and add the calculation result to the bias term as the output of forward propagation.self.weight
self.bias
3. Model testing
# 由于卷积层还未实现多通道,所以我们的图像也默认是单通道的
fake_image = torch.randn((5,5))
# 实例化卷积算子
conv = Conv2D(kernel_size=(3,3))
output = conv(fake_image)
(5, 5)
A random input image of size is created fake_image
, then the class is instantiated Conv2D
, and the convolution kernel size is passed in (3, 3)
. Then call conv
the object's forward
method to fake_image
perform the convolution operation and save the result output
in a variable. The final output output
shape.
Note : This experiment only simply implements a two-dimensional convolution layer, only supports single-channel convolution operations, and does not include training and optimization processes. If you want to know what happens next, please listen to the next chapter for decomposition.