Convolutional neural network (CNN) tensor (image) calculating dimensions and parameters (depth study)

Share some formulas to calculate tensor (image) dimensions, and convolutional neural network (CNN) calculated middle parameters.

In AlexNet network as an example, the following is a configuration diagram of the network parameters.

AlexNet network layer structure is as follows:

1.Input: size of the image is 227 * 227 * 3.

2.Conv-1: nuclear size layer, layer 1 11 * 11,96 convolution cores. Step (a stride of) 4, edge padding (padding) 0.

3.MaxPool-1: pooled Conv-1 layer 1 on pooling is a size of 3 * 3, step 2.

4.Conv-2: kernel size: 5 * 5, Number: 256, step size: 1, filling: 2

5. The MaxPool-2: Size: 3 * 3, step size: 2

6.Conv-3: Nuclear Size: 3 * 3, Number: 384, step size: 1, the filling: 1

7: Conv-4: Structure with Conv-3.

8. Conv-5: core size: 3 * 3, Number: 256, step size: 1, filling: 1

3-MaxPool 9. The : Size: 3 * 3, step size: 2

10.FC-1: fully connected layer 1 Total 4096 neurons.

11.FC-1: full-4096 connection layer 2 total neurons.

12.FC-3: fully connected layer 3 has 1,000 neurons.

Next, we describe the above network structure:

1. How to calculate the tensor (image) size;

2. How to calculate the overall parameters of the network;

Convolution layer (Conv Layer) output tensor (image) in size

It is defined as follows:

O = output image size.

I = the size of the input image.

K = the convolution kernel size layer

N = number of cores

S = moving step

P = Number of padding

Output image size is calculated as follows:

The output image is equal to the number of channels the number of cores N.

Example: AlexNet the size of the input image is 227 * 227 * 3 convolution first layer 96 is a core size 11 * 11 * 3. Step 4, are filled with zeros.

Output image is 55 * 55 * 96 (one for each channel should be checked).

Cell layer (MaxPool Layer) output tensor (image) in size

It is defined as follows:

O = output image size.
I = the size of the input image.
Moving step S =
P _S = pool size layer

Output image size is calculated as follows:

Unlike the convolution layer, the reservoir layer does not change the number of output channels.

Example: The size of each cell layer of the cell layer 1 layer after layer 3 * 3 convolution, step 2. The output of the foregoing convolutional layer is 55 * 55 * 96. Output image size cell layer is as follows:

Output size is 27 * 27 * 96.

Full connection layer (Fully Connected Layer) output tensor (image) in size

Layer fully connected the output vector length equal to the number of neurons.

Changing the tensor (image) size AlexNet by the following structure:

In AlexNet network, the output image size is 227 * 227 * 3.

Conv-1, the size becomes 55 * 55 * 96, the cell layer becomes 27 * 27 * 96.

Conv-2, size becomes 27 * 27 * 256, the cell layer becomes 13 * 13 * 256.

Conv-3, size 13 * 13 * 384 goes through Conv-4 and back Conv-5 13 * 13 * 256.

Finally, MaxPool-3 downsizing to 6 * 6 * 256.

FC-1 image obtained by converting a vector 4096 * 1 by FC-2 size is not changed. The final output size tensor 1000 * 1 by FC-3.

Next, calculate the number of parameters for each layer.

Conv Layer number of parameters

In the CNN, each layer there are two types of parameters: the total number of weights and biases for the parameters of the sum of all weights and biases.

It is defined as follows:

W is _C number of weights = convolution layer

B _C Number biases = convolution layer

P _C = number of all parameters

K = core size

N = number of cores

C = the number of channels of the input image

Convolution layer, the depth of the core is equal to the number of input channels of the image. Each core has then parameters K * K. And there are N cores. This results in the following equation.

Example: AlexNet network, the first convolutional layer, the number of channels (C) of the input image is 3, nuclear size (K) is 11 * 11, 96. The number of the core layer parameter is calculated as follows:

Calculated Conv-2, Conv-3, Conv-4, the parameters were Conv-5 614 656, 885 120, 1327488 and 884992. lumped parameter convolution layer reached 3,747,200.

Layer parameters Number MaxPool

No amount associated with MaxPool layer parameters. Size, number of steps, and are super filling parameters.

Fully Connected (FC) Layer number of parameters

There are two types of CNN in full connection layer. The first one kind is connected to the last one convolution layer, another layer is one kind of FC FC connection to other layers. In both cases we discussed separately.

Type 1: connected to Conv Layer

It is defined as follows:

W is _CF2 number = weights of

B _CF2 = number of biases
O = output image size convolutional layer

N = number of cores convolutional layer

F = number of neurons fully connected layers

Example: AlexNet first network layer is connected to a FC Conv Layer O of the layer 6, N is 256, F 4096.

参数数目远大于所有Conv Layer的参数和.

类型2:连接到FC Layer

定义如下:

W_ff= weights的数量

B_ff= biases的数量

P_ff= 总参数的数量

F= 当前FC层的神经元数量

F_-1 = 前FC层的神经元数量

示例:AlexNet的最后1个全连接层, F_-1=4096,F=1000 .

AlexNet网络中张量(图像)尺寸和参数数量

AlexNet网络中总共有5个卷积层和3个全连接层.总共有62,378,344个参数.以下是汇总表.

Layer Name	Tensor Size	Weights	Biases	Parameters
Input Image	227x227x3	0	0	0
Conv-1	55x55x96	34,848	96	34,944
MaxPool-1	27x27x96	0	0	0
Conv-2	27x27x256	614,400	256	614,656
MaxPool-2	13x13x256	0	0	0
Conv-3	13x13x384	884,736	384	885,120
Conv-4	13x13x384	1,327,104	384	1,327,488
Conv-5	13x13x256	884,736	256	884,992
MaxPool-3	6x6x256	0	0	0
FC-1	4096×1	37,748,736	4,096	37,752,832
FC-2	4096×1	16,777,216	4,096	16,781,312
FC-3	1000×1	4,096,000	1,000	4,097,000
Output	1000×1	0	0	0
Total				62,378,344

分享一些公式计算张量（图像）的尺寸，以及卷积神经网络（CNN）中层参数的计算。

以AlexNet网络为例，以下是该网络的参数结构图。

AlexNet网络的层结构如下：

1.Input: 图像的尺寸是227*227*3.

2.Conv-1: 第1层卷积层的核大小11*11，96个核。步长(stride)为4，边缘填充（padding）为0。

3.MaxPool-1: 池化层-1对Conv-1进行池化，尺寸为3*3，步长为2.

4.Conv-2: 核尺寸：5*5，数量：256，步长：1，填充：2

5.MaxPool-2: 尺寸：3*3，步长：2

6.Conv-3: 核尺寸：3*3，数量：384，步长：1，填充：1

7: Conv-4: 结构同Conv-3.

8. Conv-5: 核尺寸：3*3，数量：256，步长：1，填充：1

9. MaxPool-3: 尺寸：3*3，步长：2

10.FC-1: 全连接层1共有4096个神经元。

11.FC-1: 全连接层2共有4096个神经元。

12.FC-3: 全连接层3共有1000个神经元。

接下来，我们对以上的网络结构进行描述：

1.如何计算张量（图像）的尺寸；

2.如何计算网络的总参数；

卷积层（Conv Layer）的输出张量（图像）的大小

定义如下：

O=输出图像的尺寸。

I=输入图像的尺寸。

K=卷积层的核尺寸

N=核数量

S=移动步长

P =填充数

输出图像尺寸的计算公式如下：

输出图像的通道数等于核数量N。

示例：AlexNet中输入图像的尺寸为227*227*3.第一个卷积层有96个尺寸为11*11*3的核。步长为4，填充为0.

输出的图像为55*55*96（每个核对应1个通道）。

池化层（MaxPool Layer）的输出张量（图像）的大小

定义如下：

O=输出图像的尺寸。
I=输入图像的尺寸。
S=移动步长
P_S=池化层尺寸

输出图像尺寸的计算公式如下：

不同于卷积层，池化层的输出通道数不改变。

示例：每1层卷积层后的池化层的池化层尺寸为3*3，步长为2。根据前面卷积层的输出为55*55*96。池化层的输出图像尺寸如下：

输出尺寸为27*27*96。

全连接层（Fully Connected Layer）的输出张量（图像）的大小

全连接层输出向量长度等于神经元的数量。

通过AlexNet改变张量（图像）的尺寸的结构如下:

在AlexNet网络中，输出的图像尺寸为227*227*3.

Conv-1,尺寸变为55*55*96,池化层后变为27*27*96。

Conv-2,尺寸变为27*27*256,池化层后变为13*13*256.

Conv-3,尺寸变为13*13*384,经过Conv-4和Conv-5变回13*13*256.

最后,MaxPool-3尺寸缩小至6*6*256.

图像通过FC-1转换为向量4096*1.通过FC-2尺寸未改变.最终,通过FC-3输出1000*1的尺寸张量.

接下来,计算每层的参数数量.

Conv Layer参数数量

在CNN中,每层有两种类型的参数:weights 和biases.总参数数量为所有weights和biases的总和.

定义如下:

W_C=卷积层的weights数量

B_C=卷积层的biases数量

P_C=所有参数的数量

K=核尺寸

N=核数量

C =输入图像通道数

卷积层中,核的深度等于输入图像的通道数.于是每个核有K*K个参数.并且有N个核.由此得出以下的公式.

示例:AlexNet网络中,第1个卷积层,输入图像的通道数(C)是3,核尺寸(K)是11*11,核数量是96. 该层的参数计算如下：

计算出Conv-2, Conv-3, Conv-4, Conv-5 的参数分别为 614656 , 885120, 1327488 和884992.卷积层的总参数就达到3,747,200.

MaxPool Layer参数数量

没有与MaxPool layer相关的参数量.尺寸,步长和填充数都是超参数.

Fully Connected (FC) Layer参数数量

在CNN中有两种类型的全连接层.第1种是连接到最后1个卷积层,另外1种的FC层是连接到其他的FC层.两种情况我们分开讨论.

类型1:连接到Conv Layer

定义如下:

W_cf= weights的数量

B_cf= biases的数量
O= 前卷积层的输出图像的尺寸

N = 前卷积层的核数量

F = 全连接层的神经元数量

示例: AlexNet网络中第1个FC层连接至Conv Layer.该层的O为6,N为256,F为4096.

参数数目远大于所有Conv Layer的参数和.

类型2:连接到FC Layer

定义如下:

W_ff= weights的数量

B_ff= biases的数量

P_ff= 总参数的数量

F= 当前FC层的神经元数量

F_-1 = 前FC层的神经元数量

示例:AlexNet的最后1个全连接层, F_-1=4096,F=1000 .

AlexNet网络中张量(图像)尺寸和参数数量

AlexNet网络中总共有5个卷积层和3个全连接层.总共有62,378,344个参数.以下是汇总表.

Layer Name	Tensor Size	Weights	Biases	Parameters
Input Image	227x227x3	0	0	0
Conv-1	55x55x96	34,848	96	34,944
MaxPool-1	27x27x96	0	0	0
Conv-2	27x27x256	614,400	256	614,656
MaxPool-2	13x13x256	0	0	0
Conv-3	13x13x384	884,736	384	885,120
Conv-4	13x13x384	1,327,104	384	1,327,488
Conv-5	13x13x256	884,736	256	884,992
MaxPool-3	6x6x256	0	0	0
FC-1	4096×1	37,748,736	4,096	37,752,832
FC-2	4096×1	16,777,216	4,096	16,781,312
FC-3	1000×1	4,096,000	1,000	4,097,000
Output	1000×1	0	0	0
Total				62,378,344