torch.nn learning
Article directory
1. Convolutional layer
1.1 Conv2d
# 原型
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
Two-dimensional convolution, the input tensor
size is (B, C, H, W)
The parameters mainly include:
- in_channels (int): number of input channels
- out_channels (int): number of output channels
- kernel_size (int or tuple): convolution kernel size
- stride (int or tuple, optional): The step size of the convolution kernel movement, the default is 1
- padding (int or tuple or str, optional): add a margin of 0 to the input, the default is 0
- padding_mode (string, optional): optional parameters are zeros, reflect, replicate and circular, the default is zeros
- dilation (int or tuple, optional): convolution kernel expansion (can be understood as the interval between each pixel of the convolution kernel), the default value is 1
- groups (int, optional): the number of channel groups, the common divisor of in_channels and out_channels, the default value is 1
- bias (bool, optional): Whether to need a learnable bias value, the default is True
The value of the parameter kernel_size
, stride
, padding
, dilation
can be an integer or a tuple:
- Integer: Apply the same value in height and width dimensions.
- A tuple containing two integers: the first int to apply to the height dimension and the second int to apply to the width dimension.
Convolution visualization link: https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
2. Pooling layer
2.1 MaxPool2d
# 原型
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
Maximum pooling, also known as downsampling, the size of the input tensor is (B, C, H, W).
The main parameters are:
- kernel_size : The size of the pooling window.
- stride : The step size of the window movement, which is the same as the pooling window size by default.
- padding : padding 0 margin size.
- dilation : Pooling window dilation, which can be understood as the distance between each pixel of the window.
- return_indices : If True, returns the maximum index and output value.
- ceil_mode : If True, the output shape will be calculated using ceil mode; otherwise, it will be calculated using floor mode.
>>> # pool of square window of size=3, stride=2
>>> m = nn.MaxPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)
2.2 MaxUnpool2d
# 原型
torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)
Upsampling, which can map multiple input sizes to the same output size.
The main parameters are:
- kernel_size : The size of the pooling window.
- stride : The step size of the window movement, which is the same as the pooling window size by default.
- padding : padding 0 margin size.
The input is:
- input : The input tensor that needs to be adopted.
- indices : The maximum index subscript array returned by MaxPool2d.
- output_size (optional): The shape of the output tensor.
>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
>>> unpool = nn.MaxUnpool2d(2, stride=2)
>>> input = torch.tensor([[[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]]]])
>>> output, indices = pool(input)
>>> unpool(output, indices)
tensor([[[[ 0., 0., 0., 0.],
[ 0., 6., 0., 8.],
[ 0., 0., 0., 0.],
[ 0., 14., 0., 16.]]]])
>>> # Now using output_size to resolve an ambiguous size for the inverse
>>> input = torch.torch.tensor([[[[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.],
[11., 12., 13., 14., 15.],
[16., 17., 18., 19., 20.]]]])
>>> output, indices = pool(input)
>>> # This call will not work without specifying output_size
>>> unpool(output, indices, output_size=input.size())
tensor([[[[ 0., 0., 0., 0., 0.],
[ 0., 7., 0., 9., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 17., 0., 19., 0.]]]])
2.3 AvgPool2d
# 原型
torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)
Average pooling.
The main parameters are:
- kernel_size : The size of the pooling window.
- stride : The step size of the window movement, which is the same as the pooling window size by default.
- padding : padding 0 margin size.
- ceil_mode : If True, the output shape will be calculated using ceil mode; otherwise, it will be calculated using floor mode.
- count_include_pad: If True, include 0 padding in the average calculation.
- divisor_override: Divisor if specified, otherwise use pooling window size as divisor.
>>> # pool of square window of size=3, stride=2
>>> m = nn.AvgPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.AvgPool2d((3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)
3. Code practice
3.1 Inception Module
Code:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Inception(nn.Module):
def __init__(self, in_channles):
super(Inception, self).__init__()
self.cov1x1_1 = nn.Conv2d(in_channles, 16, kernel_size=1)
self.cov1x1_2 = nn.Conv2d(in_channles, 24, kernel_size=1)
self.cov3x3_1 = nn.Conv2d(16, 24, kernel_size=3)
self.cov3x3_2 = nn.Conv2d(24, 24, kernel_size=3)
self.cov5x5 = nn.Conv2d(16, 24, kernel_size=5)
def forward(self, x):
# 第1个分支
branch1 = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
branch1 = self.cov1x1_2(x)
# 第2个分支
branch2 = self.cov1x1_1(x)
# 第3个分支
branch3 = self.cov1x1_1(x)
branch3 = self.cov5x5(branch3)
# 第4个分支
branch4 = self.cov1x1_1(x)
branch4 = self.cov3x3_1(branch4)
branch4 = self.cov3x3_2(branch4)
branchs = [branch1, branch2, branch3, branch4]
return torch.cat(branchs, dim=1)
3.2 Residual Block
Code:
import torch
import torch.nn as nn
import torch.nn.functional as F
class ResidualBlock(nn.Module):
def __init__(self, channles):
super(ResidualBlock, self).__init__()
self.conv = nn.Conv2d(channles, channles, kernel_size=3, padding=1)
def forward(self, x):
block = F.relu(self.conv(x))
block = self.conv(block)
return F.relu(block + x)