CONV1D one-dimensional convolutional neural network operation process (for example: n rows and 3 columns ➡ n rows and 6 columns)

Author: CSDN @ _Yakult_


background

The operation process of one-dimensional convolution is not clear to many people on the Internet, and the schematic diagram is not clear. Therefore, I drew the calculation process for the one-dimensional convolution process, and explained the mechanism of the Conv1d() function in pytorch with my knowledge.


Conv1d() calculation process

Suppose we now have n rows and 3 columns of data. n rows can be n points or n sample data. The 3 columns can be regarded as 3 columns of features, ie feature vectors. If we want to upgrade it from 3 columns to 6 dimensions through MLP, we need to use the Conv1d() function. The specific process is to multiply each row of data points by a convolution kernel to get a number, and 6 convolution kernels are 6 numbers, so that the 3 columns of a point become 6 columns. Then traverse each point row by row to get a new score matrix.

Remarks: To change from 6 columns to 12 columns, just multiply 12 convolution kernels. From 12 columns to 6 columns, just multiply 6 convolution kernels.


Conv1d() calculation process diagram

①. The first line of data participates in convolution (here a is the sample data , W is the convolution kernel , and f is the result .)

②, the second line of data participates in convolution

③. The nth row of data participates in convolution


Conv1d() code example

Let's take the backbone model (multi-layer perceptron, MLP) classified in PointNet as an example, Conv1d(64, 128, 1) is actually using 128 convolution kernels with 64 rows and 1 column and the matrix with n rows and 64 columns row by row Dot product, upscale to 128 columns.

class STNkd(nn.Module):
    def __init__(self, k=64):
        super(STNkd, self).__init__()
        self.conv1 = torch.nn.Conv1d(k, 64, 1)
        self.conv2 = torch.nn.Conv1d(64, 128, 1)
        self.conv3 = torch.nn.Conv1d(128, 1024, 1)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, k*k)
        self.relu = nn.ReLU()

        self.bn1 = nn.BatchNorm1d(64)
        self.bn2 = nn.BatchNorm1d(128)
        self.bn3 = nn.BatchNorm1d(1024)
        self.bn4 = nn.BatchNorm1d(512)
        self.bn5 = nn.BatchNorm1d(256)

        self.k = k

    def forward(self, x):
        batchsize = x.size()[0]
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        x = torch.max(x, 2, keepdim=True)[0]
        x = x.view(-1, 1024)

        x = F.relu(self.bn4(self.fc1(x)))
        x = F.relu(self.bn5(self.fc2(x)))
        x = self.fc3(x)

        iden = Variable(torch.from_numpy(np.eye(self.k).flatten().astype(np.float32))).view(1,self.k*self.k).repeat(batchsize,1)
        if x.is_cuda:
            iden = iden.cuda()
        x = x + iden
        x = x.view(-1, self.k, self.k)
        return x

Principle of Linear()

It is the solution Y = X · AT + b. where AT is the transpose matrix of the weight matrix, and b is the bias matrix. nn.Linear(1024, 512) is to reduce the X matrix with n rows and 1024 columns to a matrix with n rows and 512 columns. As long as AT is a matrix with 1024 rows and 512 columns, multiply it with X, you can get a matrix with n rows and 512 columns to achieve the purpose of dimensionality reduction.
①, Linear() calculation (ignore the bias matrix)
insert image description here

Linear() animation

insert image description here


The difference between Conv1d() and Linear()

Someone has compared the difference between the two under the same input data: (1) Linear() is faster than conv1d(); (2) Conv1d() has higher accuracy than Linear(); (3 ) When backpropagating to update the gradient, the values ​​​​are different.
So why is it designed this way? After checking a lot of information, I think this answer is the most reliable. Conv1d() is used when you must preserve spatial information in semantic segmentation. When you don't need to do anything related to spatial information, such as in basic classification (mnist, cat and dog classifier), use linear layer Linear().

Conv1d Linear
Prior Knowledge have none
shared parameters have none
running speed slow quick
spatial information have none
effect feature engineering Classifier

convolution kernel

Take PointNet as an example, print out the convolution kernel to check the shape of the convolution kernel, please see my other article. The link is as follows,
https://blog.csdn.net/qq_35591253/article/details/127671790

Guess you like

Origin blog.csdn.net/qq_35591253/article/details/126668130