table of Contents
Empty convolution (dilated convolution)
Convolution transpose (deconvolution, deconvolution)
What are Convolutions?
- Mathematically, A convolution of time is an integration function that expresses the amount overlap of one function g as it is shifted over another function f. Mathematically, a convolution integral is a function, the function g represents a shift in the other function f the amount of overlap.
- Intutively, A convolution acts as a blender that mixes one function with another to give reduced data space while preserving the information. Intuitively, like a convolution mixer, a mixing function with another function, reducing data while retaining information space.
In terms of Neural Networks and Deep Learning: a convolutional neural network learning and depth:
- Convolution filter with the parameters can be learned (matrix / vector) for extracting low-dimensional features from the input data.
- They have the property of spatial or positional relationship between the input data points stored
- Convolutional neural network by strengthening the local connection pattern between adjacent layers of neurons to the use of space - local correlation.
- Convolution is applied to the input of the sliding window (a learned filter weights may be) concept, and generating a weighted sum (input weight) as the output step. And weighting the feature space as the input layer.
For example, in the face recognition problem, there is a convolution of the input image learning layer in the first few critical points, the next convolution layer edge learning and shape, the last convolution learning face layer. In the present embodiment, first, the input space is reduced to a low dimensional space (represented by dot / pixel information), and then the space is reduced to contain (p / shape) of the space to another, and finally dropped on a human face in the image classification. N-dimensional convolution can be applied.
Convolution type
Next on the list of common study and work in the kind of convolution network structure, illustrated as much as possible, use a lot of moving map; secretly tell you, this interview is essential content oh, we continue to look back, it takes about 5min.
One-dimensional convolution
The simplest one-dimensional convolution convolution is typically a sequence of data sets (but may also be used for other cases). They may be used for extracting local 1D sequences from the input sequence, and partial identification pattern within the convolution window. The following figure shows how a one-dimensional convolution filter is applied to obtain a new sequence of characteristics. Other common usage 1D convolution appear in the field of NLP, where each sentence is represented as a sequence of words.
Two-dimensional convolution
On the image dataset, CNN architecture used mostly two-dimensional convolution filter. The main idea is to move the two-dimensional convolution in two directions (x, y) by convolving the filter, the image data calculated from the low-dimensional features. Output is a two-dimensional matrix shape.
1, single-channel convolution
In depth study, convolution is the first element of multiply-add. For an image having a channel convolution as shown in FIG. Here the filter is a 3 x 3 matrix with elements [[0,1,2], [2,2,0], [0,1,2]]. Sliding the filter at the input. In each position, which we are carrying element multiplications and additions. Each slide has a final digit position. The final output is a 3 x 3 matrix.
2, multi-channel convolution
In many applications, we are dealing with an image having a plurality of channels. A typical example is an RGB image. Each RGB channel emphasizes different aspects of the original image
FIG make the multichannel deconvolution process more clearly. The input layer is a 5 x 5 x 3 matrix, there are three channels. Filter is a 3 x 3 x 3 matrix. First, the filter is applied to each core are three channels in the input layer, and added; then, performing cubic convolution, generate three dimensions of 3 × 3 channels.
The first step in the multi-channel 2D convolution: each kernel filters are applied to three channels in the input layer.
Step 2D convolution multichannel: Then these three channels are added together (by an adder element) to form a single channel.
Three-dimensional convolution
Dimensional convolution represents the three directions (x, y, z) movement, computing low-level features of the application of three-dimensional data set filters, filter. Their output is a three-dimensional shape of the volume of space, such as a cube or cuboid. There is some value in the video event detection, three-dimensional medical image or the like. They are not limited to the three-dimensional space, it can also be applied to two-dimensional spatial input image.
Empty convolution (dilated convolution)
The spacing between the cavities defined convolution kernel value. In this type of convolution, due to pitch, increasing acceptance core, for example, a core 3 * 3, expansion ratio of 2, its field of view with a 5 * 5 kernel is the same. Complex remains unchanged, but generates different characteristics (a large receptive field observation, without additional cost) in the present example.
Convolution transpose (deconvolution, deconvolution)
For many applications, and many network architectures, we often want to convert to normal convolution in the opposite direction that we want to perform on the sample. Some examples include generating a high resolution image and a low-dimensional feature map is mapped to a high-dimensional space, such as an automatic encoder or semantic segments.
Traditionally, the sample can be achieved by applying an interpolation scheme to create rules or manually. However, neural networks like modern architecture lets network itself automatically learn the correct conversion without the need for human intervention.
For example in the figure, we use a 3 x 3 convolution kernel application transpose the 2 x 2 input unit using the 2 x 2 stride filling frame, the size of the sampled output of 4 x 4
Separable convolution depth
First, we'll convolution depth applied to the input layer. We are not using the 2D convolution in a size of 3 x 3 x 3 of a single filter, but the cores 3 were used. The size of each filter is a 3 x 3 x 1. 1 and the core of each channel by convolving the input layer (only one channel, but not all channels!). Providing each such dimension convolution of FIG. 5 × 5 × 1 in. We then stacked together to create figures 5 × 5 × 3 image. After that, our output size is 5 x 5 x 3. We now reduce the size of the space, but the depth remains the same as before.
Separable convolution depth - Step: We are using three cores, rather than having a size of 3 x 3 x 3 in a single filter 2D convolution. The size of each filter is a 3 x 3 x 1. A channel layer of each kernel and convolving the input (only one channel, but not all channels). Providing each such dimension convolution of FIG. 5 × 5 × 1 in. We then stacked together to create figures 5 × 5 × 3 image. After that, our output size is 5 x 5 x 3.
As a second step separable convolution depth, in order to expand the depth, we use 1x1 convolution kernel size is 1x1x3. The 5 x 5 x 3 input image and each of the 1 x 1 x 3 kernel contrast, can provide a mapping of size 5 x 5 x 1.
Therefore, after application of 128 1x1 convolution, we can obtain a size of a layer of 5 x 5 x 128.
Through these two steps, the depth will be separable convolution of the input layer (7 x 7 x 3) to an output layer (5 x 5 x 128). Separable convolution depth of the entire process as shown in FIG.
So, what is the advantage of the depth of separable convolution? effectiveness! Compared with the 2D convolution, the depth separable convolution requires fewer operations.
Let us recall calculate the cost 2D convolution example. There are 128 3x3x3 mobile core 5x5 times. It is 128 x 3 x 3 x 3 x 5 x 5 = 86,400 multiplications.
How about separable convolution? In a first step, the depth of convolution with the kernel moves 5x5 3x3x1 3 times. It was 3x3x3x1x5x5 = 675 multiplications. 1 x 1 in the second step convolution kernel 128 moves 5x5 1x1x3 times. This is a 128 x 1 x 1 x 3 x 5 x 5 = 9,600 multiplications. So overall, separable convolution depth required 675 + 9600 = 10,275 multiplications. This is only about 12% 2D convolution cost!
1 x 1 convolution
1 x 1 multiplied by a number in the convolution of the input layer of each digit. If the input layer has a plurality of channels, this convolution will produce interesting effects. The following figure illustrates how the convolution of 1 x 1 is applied to the input layer dimensions H x W x D of. After the filter size is 1 x 1 x D convolution of 1 x 1, the size of the output channels is H x W x 1. If we apply such a 1 x 1 N and then joined together convolution result, we can get a dimension H x W x N output layer.
Initially, the network network file proposed 1 x 1 convolution. Then a number of advantages, they are highly used in the 1 x 1 Google Inception convolution is: to reduce the dimensions to achieve a highly efficient nonlinear again computationally efficient low dimensional embedding feature pool or convolution
In the above figure can be observed the first two advantages. 1 x 1 After convolution, we significantly reduced in size. Suppose the original input channels 200, 1 x 1 would convolution of these channels (functions) into a single channel. A third advantage is the convolution after 1 x 1, can be added, such as a non-linear activation ReLU nonlinear function allows more complex learning.
Packet convolution
In 2012, the group introduced convolution in AlexNet papers. It is achieved mainly by two trained network allows GPU with limited memory (GPU 1.5 GB per memory). The following shows two independent AlexNet convolution path on most layer. It is being carried out across two GPU parallelization model (of course, if there are more GPU, can be multi-GPU parallelization).
Here, we describe how to group work convolution. First, the traditional 2D convolution, follow these steps. In this example, by applying a filter 128 (the size of each filter is a 3 x 3 x 3), the size (7 x 7 x 3) input layer is converted into a size of (5 x 5 x 128) the output layer. Or, in general, by applying Dout cores (size each hxwx Din) The size (Hin x Win x Din) is converted into the input layer size (Hout x Wout x Dout) of the output layer.
Convolution in the packet, the filter is divided into different groups. Each group is responsible for the traditional 2D convolution with a certain depth. As shown below.
The above is described with two packets convolution filter bank. In each filter bank, the depth of each filter is only half the depth of the nominal 2D convolution. They have a depth Din / 2. Each filter bank comprising Dout / 2 filter. A first filter set (red) and the front half portion of the input layer ([:,: 0: Din / 2]) convolution, and the second filter bank (blue) and the second half of the convolution of the input layer ([:,:, Din / 2: Din]). Thus, each filter group creates Dout / 2 channel. Overall, the two groups to create a 2 x Dout / 2 = Dout Channel. We then use these channels Dout channel stacked in the output layer.
PS Appendix:
To introduce a convolution process visualization tools , this project is an open source project github above: https://github.com/vdumoulin/conv_arithmetic
https://baijiahao.baidu.com/s?id=1625255860317955368&wfr=spider&for=pc
https://www.kaggle.com/shivamb/3d-convolutions-understanding-use-case/data
White CV : No. designed to focus public CV (computer vision), AI (artificial intelligence) technology-related fields, the main content of the article around the C ++, Python programming techniques, machine learning (ML), the depth of learning (DL), OpenCV image processing, etc. technology, explore the depth of technical points, study and work record common operations, problems do you learn to work assistant. Only concerned with technology, the professional knowledge sharing platform CV field.
There headlines today number of small partners, welcome to the venue to pay attention to: loose money first Sen . I'll headlines today platform to share their own life, study, work content, financial and other, thank you. Best wishes
----------------