Deep Learning - 3D Convolutional Neural Network (3DCNN)

1. 3DCNN understands that
2D convolution only considers the spatial information of 2D pictures, so it is only suitable for visual understanding tasks of a single 2D picture. When processing 3D images or videos, the input of the network has one more dimension, and the input consists of ( c , height , width ) (c,height,width)(c,height,width)变为了 ( c , d e p t h , h e i g h t , w i d t h ) (c,depth,height,width) (c,depth,height,w i d t h ) , whereccc is the number of channels,depth depthd e pt h is the width of the input data. Therefore, when processing the data, the convolution needs to be transformed accordingly, from 2D convolution to 3D convolution.
On the basis of 2D convolution, 3D convolution is proposed. 3D convolution has one more dimension than 2D convolution in structure, and the size of 2D convolution can be expressed askh × kw k_h \times k_wkh×kw, and the size of the 3D convolution can be expressed as kh × kw × kd k_h \times k_w \times k_dkh×kw×kd. The specific calculation formula of 3D convolution is similar to that of 2D convolution, that is, each time you slide, it is the same as ccc channels, the size is( depth , height , width ) (depth, height, width)(depth,height,w i d t h ) is multiplied and added to obtain a value in the output feature map, as shown in the figure.
insert image description here
insert image description here

视频输入的维度: i n p u t C × i n p u t T × i n p u t W × i n p u t H input_C \times input_T \times input_W \times input_H inputC×inputT×inputW×inputH;
Dimensions of 3D convolution kernel: input C input_CinputCA parallel dimension is kernel T × kernel W × kernel H kernel_T \times kernel_W \times kernel_HkernelT×kernelW×kernelHThe convolution kernel;
3D convolution kernel in T , W , HT, W, HT,W,H moves in three directions.

Reference:
3D CNN

Guess you like

Origin blog.csdn.net/weixin_40826634/article/details/128269149