nn.Conv1d simple understanding

1. Definition of official documents

In the simplest case, the output value of the layer with input size
( N , C in , L ) (N, C_{\text{in}}, L) (N,Cin,L) and output ( N , C out , L out ) (N, C_{\text{out}}, L_{\text{out}}) (N,Cout,Lout) can be precisely described as:
out ( N i , C out j ) = bias ( C out j ) + ∑ k = 0 C i n − 1 weight ( C out j , k ) ⋆ input ( N i , k ) \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) out(Ni,Coutj)=bias(Coutj)+k=0Cin1weight(Coutj,k)input(Ni,k)
where ⋆ \star is the valid cross-correlation_ operator,
N is a batch size, C denotes a number of channels,
L is a length of signal sequence.
$$

This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

* :attr:`stride` controls the stride for the cross-correlation, a single
  number or a one-element tuple.

* :attr:`padding` controls the amount of implicit zero-paddings on both sides
  for :attr:`padding` number of points.

* :attr:`dilation` controls the spacing between the kernel points; also
  known as the à trous algorithm. It is harder to describe, but this `link`_
  has a nice visualization of what :attr:`dilation` does.

* :attr:`groups` controls the connections between inputs and outputs.
  :attr:`in_channels` and :attr:`out_channels` must both be divisible by
  :attr:`groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv
      layers side by side, each seeing half the input channels,
      and producing half the output channels, and both subsequently
      concatenated.
    * At groups= :attr:`in_channels`, each input channel is convolved with
      its own set of filters,
      of size  

⌊ o u t _ c h a n n e l s i n _ c h a n n e l s ⌋ \left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor in_channelsout_channels

1.1 Parameter explanation

  • Input: ( N , C i n , L i n ) (N, C_{in}, L_{in}) (N,Cin,Lin)
  • Output: ( N , C o u t , L o u t ) (N, C_{out}, L_{out}) (N,Cout,Lout) where

Among them as mentioned above:

  • NRepresents the batch size,

  • CRepresents channels, the number of channels. In a sequence, represents the dimension of each column vector. 
    C in C_{in}CinThe encoded dimension of each column vector in the input sequence. 
    C out C_{out}CoutThe encoded dimension of each column vector expected in the output sequence.

  • LRepresents the length of the sequence sequence, that is, how many column vectors there are in the sequence.
    L in L_{in}LinHow many column vectors are included in the input sequence.
    L out L_{out}LoutHow many column vectors are included in the output sequence.

The output looks like this:

L o u t = ⌊ L i n + 2 × padding − dilation × ( kernel_size − 1 ) − 1 stride + 1 ⌋ L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor Lout=strideLin+2×paddingdilation×(kernel_size1)1+1

1.2 Example of operation

Where padding defaults to 0, dilation defaults to 1, groups defaults to 1,

The calculation formula is calculated according to the above.

import torch.nn as nn

m = nn.Conv1d(16,33, 3, stride =2)
input = torch.rand(20, 16, 50)

output = m(input)


print(output.shape)
torch.Size([20, 33, 24])

Guess you like

Origin blog.csdn.net/chumingqian/article/details/130125900