Recently I was reading "Interleaved Group Convolutions for Deep Neural Networks" by researcher Wang Jingdong of MSRA. The concept of group convolution is mentioned many times in the paper, so I specially learned about group convolution.
Group convolution first appeared in AlexNet. In order to solve the problem of insufficient video memory, the network is deployed on two GTX 580 graphics cards for training. Alex believes that the group conv method can increase the diagonal correlation between filters, and can reduce the training parameters, which is not easy to overfit. Similar to regular effect.
We assume that the output feature map of the previous layer has N
one, that is, the number of channels channel=N
, which means that the previous layer has N
a convolution kernel. Also assume the number of groups of group convolutions M
. Then the operation of the group convolutional layer is to channel
divide it into M
components first. Each group
corresponds to N/M
one channel
and is independently connected to it. Then, after each group
convolution is completed, the outputs are concatenated together as the output of this layer channel
.
The figure below shows the structure of AlexNet. It can be seen that the network is divided into upper and lower parts.
The following figure is the result of visualizing the convolution kernel of the upper and lower parts.
AlexNet conv1 filter separation: as noted by the authors, filter groups appear to structure learned filters into two distinct groups, black-and-white and colour filters.
The visualization of the first convolutional layer shows that after learning, one group becomes A black and white filter, the other seems to be a color filter.
The figure below is a normal, ungrouped convolutional layer structure. The following figure shows the structure of CNN from a third-dimensional perspective, and a filter corresponds to an output channel. As the number of network layers deepens, the number of channels increases sharply, while the spatial dimension decreases, because the convolutional layer has more and more convolution kernels, but with the convolution pooling operation, the feature map becomes smaller and smaller. Therefore, in the deep network, the importance of the channel is increasing.
The figure below is a group convolutional CNN structure. Filters are divided into two groups. Each group has only half of the original feature map.