One-dimensional, two-dimensional, three-dimensional convolution operations in pytorch

The convolution operation is to use the sliding window mechanism to perform cross-correlation operations on the data to extract features.

1D convolution

One-dimensional convolution is used to process sequence data. Each sequence element is generally encoded before input. The format of the input sequence obtained in this way should be [batch_size, seq_len, embedding_size] , where embedding_size is equivalent to the same concept as the number of channels . Therefore, before processing, permute(0,2,1) is generally performed to convert the input format to [batch_size, embedding_size, seq_len], and embedding_size is used as the middle layer as the number of channels as the input of one-dimensional convolution.

eg:

self.conv1 = nn.Conv1d(in_channels=n_feature, out_channels=n_feature, kernel_size=1,
                       stride=1,padding=0, dilation=1, groups=1,
                       bias=True, padding_mode='zeros')

2D convolution

Two-dimensional convolution is the earliest convolution operation proposed for processing high-dimensional data. The input of two-dimensional convolution is [batch_size, channel_num, H, W] .

PS: In python, the cv2 function is usually used to read pictures. The read format is [H, W, C] , and torchvision.transforms.ToTensor() can be used to convert the pictures read by cv2 to the format used in pytorch [ C,H,W] .

self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=2, 
                       bias=False, dilation=1)  # 前两个参数为通道数

PS: hole convolution

One of the parameters of the convolution operation is dilation. If it is set to 1, it is a normal convolution operation. If it is set to greater than 1, it is a hole convolution. The hole convolution operation can obtain a larger feeling through a smaller convolution kernel. In the wild, taking the convolution kernel as 3*3 and dilation=2 as an example, the original convolution operation will take a 3*3 sub-region and the convolution kernel on the feature map to perform cross-correlation operations, while using dilated convolution Afterwards, the 3*3 convolution kernel will be filled with 5*5, that is, the original convolution kernel will be filled with 0, so that a 5*5 area will also be operated during the cross-correlation operation, and the dilation is The meaning of 2 is that the distance between each element in the convolution kernel after filling is 2.

3D convolution

Three-dimensional convolution is used to extract video data features, and the input data format is [batch_size, channel_num, t_len, H, W].

self.conv3d = nn.Conv3d(in_channels=in_channels, out_channels=output_channels,
                        kernel_size=kernel_shape, stride=stride,
                        padding=0, bias=self._use_bias)

Summary: Compared with low-dimensional, high-dimensional convolution requires one more dimension input, and the first two dimensions of the data input are batch_size and channel_num.

Guess you like

Origin blog.csdn.net/c_procomer/article/details/123919925