Submanifold Sparse Convolutional Network
Publication time: 2019
作者 : Benjamin Graham, Laurens van der Maaten。 Facebook AI Research
I see that many recent point cloud networks use this method, so let's read it.
Summary
The data processed by convolution (such as pictures) are generally dense, but some data is sparse (such as point cloud data, strokes formed on paper). It dense的卷积网络
is very inefficient to use directly on these sparse data . This paper introduces a sparse convolution operation, which 稀疏数据
is customized for processing , and is different from the previous work of sparse convolution networks. It runs strictly on submanifolds instead of extending observations to each layer of the network.
( In short, it is a convolution that is very suitable for processing sparse data )
Why can't we use normal convolution operations to process sparse data?
"Submanifold" dilation problem
Left: The original input is a hand-drawn circle, which is a one-dimensional curve embedded on a two-dimensional grid.
Middle: the result after a 3x3 convolution
Right: The result after two 3x3 convolutions
The input in the above figure is an example of handwritten digit recognition. The author called the problem shown in the above figure “submanifold” dilation 问题
. You can see that the original data is very sparse, but after using traditional convolution, the sparsity disappears quickly.
If instead of convolving all pixels, what will happen if only the white pixels with curve are convolved?
As a result, a lot of information will be lost and cannot be classified.
Note: The author calls these meaningful pixels in white the active site
What is a (submanifold) submanifold?
We use the term 'submanifold' to refer to input data that is sparse because it has a lower effective dimension than the space in which it lives, for example a one-dimensional curve in 2+ dimensional space, or a two-dimensional surface in 3+ dimensional space.
We use the term "submanifold" to indicate that the input data is sparse, because its effective dimension is lower than the space in which it is located, such as a one-dimensional curve in a two-dimensional space, or a two-dimensional surface in a three-dimensional space.
Sparse convolutional SC and VSC
The author proposes two convolution operations: SC and VSC.
(1) SC convolution operation
(2) VSC convolution operation
(3) Activation function and pooling function
(4) Calculation and memory overhead
The following figure shows the calculation and memory overhead of active site and non-active site in traditional convolution C, SC, and VSC.
Context: The filter size is 3, convolving a d-dimensional single location.
Computation and memory overhead of traditional convolution C, SC, VSC
Among them, a is the active sites number, m is the input channel number, and m is the output channel number.
Construct common networks with the proposed convolution
Modules used for building sub-manifold sparse convolutional networks
The authors used the VSC and SC proposed to build a very popular network of several modules: VGG
, ResNet
, DenseNet
module.
(a) VGG block
is composed of 2 VSCs and a max pooling
(b) 保持输入输出分辨率不变的ResNet block
is to add the output of two VSCs to the input
(c) 减小分辨率的ResNet block
(d) DenseNet block that keeps the input and output resolution unchanged
(e) DenseNet block with reduced resolution
Implementation process
(1) Several key parts are involved:
(2) The realization process of SC:
(3) The realization process of VSC
The only difference from SC is the construction of the rule book.
experiment
The results on the ModelNet-40 data set.
It can be seen that the accuracy of VSC is not much worse than that of traditional convolution C, but the 速度
sum 内存
has been greatly improved.
At last
If there is a picture like the following in the result visualization, it can better indicate that using SC/VSC will not reduce the data sparsity.
But no, try to visualize it yourself.
This article was not published, but was published later as part of "3D Semantic Segmentation with Submanifold Sparse Convolutional Networks".
If you don’t understand some of the details of this article, it’s okay. Read and write the more clear and detailed "3D Semantic Segmentation with Submanifold Sparse Convolutional Networks". I vomited blood a little bit. I read "3D Semantic Segmentation with Submanifold Sparse Convolutional Networks" after writing this. Many of the details I guessed were clearly stated in the article. Go and read it. It is much clearer.