nn.Conv1d、nn.Conv2d、nn.Linear

nn.Linear

Args:
in_features: size of each input sample
out_features: size of each output sample
bias: If set to False, the layer will not learn an additive bias.
Default: True

Shape:
- Input: :math:(*, H_{in}) where :math:* means any number of
dimensions including none and : H i n = in_features H_{in} = \text{in\_features} Hin=in_features.
- Output: :math:(*, H_{out}) where all but the last dimension
are the same shape as the input and : H o u t = out_features H_{out} = \text{out\_features} Hout=out_features.

Examples::
        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20) 
        # 需要输入到nn.Linear(in_features, out_features)中的tensor,假如为[B C H W]需要满足:H*W == in_features
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])

nn.Conv1d

In the simplest case, the output value of the layer with input size ( N , C in , L ) (N, C_{\text{in}}, L) (N,Cin,L) and output : ( N , C out , L out ) (N, C_{\text{out}}, L_{\text{out}}) (N,Cout,Lout) can be precisely described as:
out ( N i , C out j ) = bias ( C out j ) + ∑ k = 0 C i n − 1 weight ( C out j , k ) ⋆ input ( N i , k ) \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) out(Ni,Coutj)=bias(Coutj)+k=0Cin1weight(Coutj,k)input(Ni,k)

Shape:
- Input: :math: ( N , C i n , L i n ) (N, C_{in}, L_{in}) (N,Cin,Lin) or :math: ( C i n , L i n ) (C_{in}, L_{in}) (Cin,Lin)
- Output: :math: ( N , C o u t , L o u t ) (N, C_{out}, L_{out}) (N,Cout,Lout) or :math: ( C o u t , L o u t ) (C_{out}, L_{out}) (Cout,Lout), where

… math::
L o u t = ⌊ L i n + 2 × padding − dilation × ( kernel_size − 1 ) − 1 stride + 1 ⌋ L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor Lout=strideLin+2×paddingdilation×(kernel_size1)1+1

Attributes:
weight (Tensor): the learnable weights of the module of shape
( out_channels , in_channels groups , kernel_size ) (\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size}) (out_channels,groupsin_channels,kernel_size).
The values of these weights are sampled from
:math: U ( − k , k ) \mathcal{U}(-\sqrt{k}, \sqrt{k}) U(k ,k ) where
:math: k = g r o u p s C in ∗ kernel_size k = \frac{groups}{C_\text{in} * \text{kernel\_size}} k=Cinkernel_sizegroups
bias (Tensor): the learnable bias of the module of shape
(out_channels). If :attr:bias is True, then the values of these weights are
sampled from :math: U ( − k , k ) \mathcal{U}(-\sqrt{k}, \sqrt{k}) U(k ,k ) where
:math: k = g r o u p s C in ∗ kernel_size k = \frac{groups}{C_\text{in} * \text{kernel\_size}} k=Cinkernel_sizegroups

Examples::
# 需要输入到nn.Conv1d(in_channels, out_channels)的中的tensor[B C H W]需要满足:H*W = in_channels,
# 并且需要将输入进行维度转换input.permute(0,2,1).clone().contiguous()) 
# 或者 input = einops.rearrange(input, 'b c h w -> b (h w) c')
        >>> m = nn.Conv1d(16, 33, 3, stride=2)
        >>> input = torch.randn(20, 16, 50)
        >>> output = m(input)  # torch.Size([20, 33, 24])
  • nn.Conv1d, kernel_size=1与nn.Linear不同
    两者都可以作为全连接层,实现同样结构的MLP计算,但计算形式不同,具体为:

    nn.Conv1d输入的是一个[batch, channel,
    length],3维tensor,而nn.Linear输入的是一个[batch, *, in_features],可变形状tensor,在进行等价计算时务必保证nn.Linear输入tensor为三维
    nn.Conv1d作用在第二个维度位置channel,nn.Linear作用在第三个维度位置in_features,对于一个XXX,若要在两者之间进行等价计算,需要进行tensor.permute,重新排列维度轴秩序。
    nn.Linear速度比 nn.Conv1d, kernel_size=1速度更快。

MLP代码实现:

class MLPWithConv(nn.Module):
    def __init__(self, channels, expansion, drop):
        super().__init__()

        self.dim1 = channels
        self.dim2 = channels * expansion
        self.mlp = nn.Sequential(
            nn.Conv2d(self.dim1, self.dim2, 1, 1, 0),
            nn.Conv2d(self.dim2, self.dim2, 3, 1, 1, groups=self.dim2),
            nn.GELU(),
            nn.Dropout(drop, inplace=True),
            nn.Conv2d(self.dim2, self.dim1, 1, 1, 0),
            nn.Dropout(drop, inplace=True)
        )
    
    def forward(self, x):

        x = self.mlp(x)
        
        return x

class MLPWithLinear(nn.Module):
    def __init__(self, channels, expansion, drop):
        super().__init__()

        self.dim1 = channels
        self.dim2 = channels * expansion
        self.mlp_chunk = nn.Sequential(
            nn.Linear(self.dim1, self.dim2),
            nn.GELU(),
            nn.Dropout(drop, inplace=True),
            nn.Linear(self.dim2, self.dim1),
            nn.Dropout(drop, inplace=True)
        )
  
    def forward(self, x):
        _, _, H, W = x.size()
        x = einops.rearrange(x, 'b c h w -> b (h w) c')
        x = self.mlp_chunk(x)
        x = einops.rearrange(x, 'b (h w) c -> b c h w', h=H, w=W)
        
        return x

Conv1d与Linear测试及速度对比:

def count_parameters(model):
    """Count the number of parameters in a model."""
    return sum([p.numel() for p in model.parameters()])

conv = torch.nn.Conv1d(8,32,1)
print(count_parameters(conv))
# 288

linear = torch.nn.Linear(8,32)
print(count_parameters(linear))
# 288

print(conv.weight.shape)
# torch.Size([32, 8, 1])
print(linear.weight.shape)
# torch.Size([32, 8])

# use same initialization
linear.weight = torch.nn.Parameter(conv.weight.squeeze(2))
linear.bias = torch.nn.Parameter(conv.bias)

tensor = torch.randn(128,256,8)
permuted_tensor = tensor.permute(0,2,1).clone().contiguous()

out_linear = linear(tensor)
print(out_linear.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)

out_conv = conv(permuted_tensor)
print(out_conv.mean())
# tensor(0.0067, grad_fn=<MeanBackward0>)


Speed test:

%%timeit
_ = linear(tensor)
# 151 µs ± 297 ns per loop

%%timeit
_ = conv(permuted_tensor)
# 1.43 ms ± 6.33 µs per loop

nn.Conv2d

Shape:
- Input: ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) (N,Cin,Hin,Win) or : ( C i n , H i n , W i n ) (C_{in}, H_{in}, W_{in}) (Cin,Hin,Win)
- Output: ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout) or : ( C o u t , H o u t , W o u t ) (C_{out}, H_{out}, W_{out}) (Cout,Hout,Wout), where

H o u t = ⌊ H i n + 2 × padding [ 0 ] − dilation [ 0 ] × ( kernel_size [ 0 ] − 1 ) − 1 stride [ 0 ] + 1 ⌋ H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor Hout=stride[0]Hin+2×padding[0]dilation[0]×(kernel_size[0]1)1+1

W o u t = ⌊ W i n + 2 × padding [ 1 ] − dilation [ 1 ] × ( kernel_size [ 1 ] − 1 ) − 1 stride [ 1 ] + 1 ⌋ W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor Wout=stride[1]Win+2×padding[1]dilation[1]×(kernel_size[1]1)1+1

Examples:
# 需要输入到nn.Conv2d(in_channels, out_channels)的中的tensor[B C H W]需要满足:C = in_channels
        >>> # With square kernels and equal stride
        >>> m = nn.Conv2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> # non-square kernels and unequal stride and with padding and dilation
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)

计算卷积中一共有多少次乘法和加法操作

输入数据形状是[ 10 , 3 , 224 , 224 ],卷积核为(3, 3),输出通道为64,步幅stride=1, 填充padding=1,则完成这样一个卷积,一共需要做多少次乘法和加法操作?

提示
先看输出一个像素点需要做多少次乘法和加法操作,然后再计算总共需要的操作次数。
做题步骤:

1、先考虑只有一个输入通道时候的二维卷积:
假设输出是B,输入是A,先计算B的一个像素点,由于kernel为3,根据卷积的原理,对应B中的一个像素为: B 1 B_1 B1 A ( 3 ∗ 3 ) A_{(3*3)} A(33) K ( 3 ∗ 3 ) K_{(3*3)} K(33)对应元素相乘,之后再将相乘后的九个元素相加,会有9个乘法运算和8个加法运算。

2、但是一般我们输入的图片都是RGB三通道的,将这些输入通道的数值相加 3 × ( 9 + 8 ) 3×(9+8) 3×(9+8),并且加上偏置参数 b B 00 = B 00 ( c = 0 ) + B 00 ( c = 1 ) + B 00 ( c = 2 ) + b B_{00}=B^{(c=0)}_{00}+B^{(c=1)}_{00}+B^{(c=2)}_{00}+b B00=B00(c=0)+B00(c=1)+B00(c=2)+b

需要额外引入3次加法操作,所以最后总的乘法加法操作次数是3×(9+8)+3=54,其中乘法27次,加法27次。
由此可得计算出一个像素点需要乘法操作次数是27,加法操作次数也是27。

输出特征图的大小是[ 10 , 64 , 224 , 224 ] [10,64,224,224][10,64,224,224],则总共需要乘法操作次数是:
27 × 10 × 64 × 224 × 224 = 867041280 27 × 10 × 64 × 224 × 224 = 867041280 27×10×64×224×224=867041280

加法操作次数和乘法操作次数相同都是867041280。
————————————————
版权声明:本文为CSDN博主「心无旁骛~」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/m0_63007797/article/details/128714136

猜你喜欢

转载自blog.csdn.net/qq_39506862/article/details/128493185