姿态估计——Real-time 2D Multi-Person Pose Estimation on CPU:Lightweight OpenPose

Abstract

    在本工作中,我们调整了多人姿态估计体系结构,以便在边缘设备上使用它。我们遵循2016年COCO数据集关节点挑战获胜者OpenPose的自下而上的方法。通过提出的网络设计和优化后处理代码,完整的解决方案在英特尔®NUC6i7KY B mini PC上以每秒28帧(FPS)运行,在核心i7-6850K CPU上以26个FPS运行。该网络模型具有4.1M参数和90亿个浮点运算(GFLOPs)复杂度,仅∼基线2级OpenPose的15%,其质量几乎相同。该代码和模型可作为Intel®Open VINOTM工具包的一部分。

1.Introduction

   多人姿态估计是一项重要的任务,可用于动作识别、运动捕捉、运动等不同领域。任务是为图像中的每个人预测一个姿势骨架。骨骼由关键点(或关节)组成:脚踝、膝盖、臀部、肘部等。借助卷积神经网络(CNNs),大大提高了人体姿态估计精度)。然而,对紧凑、高效的姿态估计方法的研究还很少。在[9]作者展示了一个简化的MaskR-CNN关键点检测器演示在手机上,运行在10fps,但没有提供实现细节或准确性特性。我们还发现了与人体姿态估计网络[10]的开源存储库。作者报道了2.8GHz四核CPU上的4.2fps和JetsonTX2板上的10fps的推理速度。在我们的工作中,我们优化了流行的方法OpenPose,并展示了CNN的现代设计技术如何用于姿态估计任务。因此,我们的解决方案是:

  • 在mini PC英特尔®NUC上有28个FPS,它消耗的功率很小,有45瓦的CPU TDP。
  • 在通常的CPU上的26fps,而不需要显卡。

优化版本的精度几乎与基线相匹配:平均精度(AP)下降小于1%。

2.Related Work

     多人姿态估计问题通常可以通过两种方式来解决。 第一个,称为自顶向下,应用一个人检测器,然后在每个检测到的人上应用单人姿态估计算法。因此,将姿态估计问题解耦为两个子问题,可以利用这两个领域的最新成果。这种方法的推理速度很大程度上取决于图像中检测到的人的数量。第二个叫自下而上,对人数比较稳健。首先,在给定的图像中检测到所有关键点,然后根据人类实例对它们进行分组。这种方法通常比以前更快,因为它找到关键点一次,不会重新运行每个人的姿态估计。在[11]作者提出了最快的方法与最先进的质量在自下而上的方法,在一个GTX1080Ti显卡上,一个有三个人的图像可以达到23 fps。他们指出,检测有20个人的图像,性能会下降为15 fps。我们的工作基于流行的自下而上的方法OpenPose,它与人的数量变化无关,不受人数的影响。

   

3.Analysis of the Original OpenPose


3.1 Inference Pipline

与所有自下而上的方法相似,OpenPose pipline 由两部分组成:

  • 参考神经网络提供两个张量:关键点热图及其成对关系(部分亲和场,PAFS)。 此输出降采样8次。
  • 按个人实例分组关键点。 它包括对原始图像大小的上采样张量、热图峰的关键点提取及其实例分组。

网络首先提取特征,然后执行热图和PAFS的初始估计,然后执行5个细化阶段。它能够找到18种类型的关键点。 然后,分组过程从预定义的关键点对列表中(通过亲和性)搜索每个关键点的最佳对。左肘左腕,右髋右膝,左眼左耳等,总共19对。在实现过程中,输入图像的大小被调整以匹配网络输入大小的高度,宽度被缩放以保持图像的纵横比,然后填充到8的倍数。

3.2 Complexity Analysis

     原始实现使用VGG-19骨干[14]切割到conv4_2层作为特征提取器。然后增加两个额外的卷积层conv4_3和conv4_4。在此之后,进行了初始和5个细化阶段。每个阶段由两个平行分支组成:一个用于热图估计,一个用于PAFS。两个分支设计相同,见表1。我们在比较中将网络输入分辨率设置为368x368,并使用与原始论文相同的COCO验证子集,进行单尺度测试。测试CPU是Intel®CoreTM  i7-6850K,3.6GHz。表2显示了精化阶段的数量与准确性之间的权衡。

     可以看出,后一阶段对GFLOP的改进较少,因此对于优化版本,我们将只保留前两个阶段:初始阶段和单个细化阶段。   后处理部分的概况见表3。它是通过运行代码获得的,该代码是用带有OpenCV的C语言编写的。尽管分组本身是轻量级的,但其他部分需要进行优化。

4.Optimization

4.1 Network Design

所有实验都是用原始论文中的默认训练参数进行的,我们使用COCO数据集[12]进行训练。如上所述,我们只保留初始和第一个细化阶段。然而,其余阶段可以提供正规化效果,因此最终的网络添加额外的阶段被重新训练,但前两个被使用。这种程序∼1%的AP改进。

4.1.1 Lightweight Backbone

自提出VGG网络以来,很少有类似甚至更好的分类精度的轻量级网络拓扑被设计[7]、[8]、[13]。我们评估了来自MobileNet家族的网络,以取代VGG特征提取器,并从MobileNetv1开始。以一种简单的方式,如果我们把所有的层保持到最深,这匹配输出张量分辨率,它会导致显着的精度下降。这可能是由于浅度和弱特征表示。为了节省空间分辨率和重用骨干权重,我们使用空洞卷积[17]。去除conv4_2/dw层的Stride,并将扩张率值设置为2,用于后续的conv5_1/dw层,以保留感受野。所以我们使用所有的层直到conv5_5块。添加conv5_6块提高了精度,但以性能为代价。我们还尝试了更轻量级的骨干移动网v2,但是它没有显示出好的结果,见表4。

4.1.2 Lightweight Refinement Stage

为了产生新的关键点热图和PAFs估计,细化阶段从骨干中提取特征,与以前对关键点热图和PAFs的估计相连接。基于这一事实,我们决定在热图和PAFS之间共享大部分计算,并在初始和细化阶段使用单个预测分支。我们共享所有层,除了最后两个层,它们直接产生关键点热图和PAFS,见图2。然后,将每个具有7x7核大小的卷积替换为具有相同感受野的卷积块,以捕获远程空间依赖关系。我们用这个块设计进行了一系列的实验,并观察到有三个连续的卷积,分别为1x1、3x3和3x3核大小,后者扩张率等于2,以保持初始感受野。由于网络变得更深,我们为每个这样的块添加了残差连接[5]。

4.2 Fast Post-processing

我们对代码进行了剖析,并删除了额外的内存分配,使用OpenCV进行并联关键点提取。这使得代码大大加快,最后一个瓶颈是将特征图调整到输入图像大小。我们决定跳过调整大小步骤,并直接在网络输出上执行分组,但精度明显下降。因此,不能避免使用上采样特征图的步骤,但不需要输入图像大小。我们的实验表明,对于上采样因子8,精度是相同的,就像调整大小以输入图像大小一样。我们使用上采样因子4.

4.3 Inference

对于网络实现,我们使用Intel®Open VINOTM工具包R4[1],它提供了跨不同硬件的优化,如CPU、GPU、FPGA等。最终的表现数字如表6所示,它们是为一个具有20多个估计姿

势的具有挑战性的视频测量的。

5 Conclusion

在本工作中,我们探讨了适合边缘设备实时性能的人体姿态估计网络问题。我们提出了基于OpenPose方法的解决方案,并对网络设计和后处理代码进行了大量优化。由于使用具有深度可分离卷积的扩展MobileNetv1特征提取器和具有残差连接的轻量级细化阶段的设计,精度比在复杂网络上提高了6.5倍以上。该网络可以作为Open VINO工具包的一部分下载,名为human-pose-estimation-0001。网络描述在Open Model Zoo存储库中可用。整个解决方案在通常的CPU上实时运行,以及在NUC迷你PC上运行,并且与baseline 2-stage 网络的精度密切匹配。一些技术可以进一步提高性能和准确性,如量化、修剪、知识蒸馏。我们把它们留给了未来的研究。

Github 地址(Pytorch实现):https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch

import torch
from torch import nn

#from modules.conv import conv, conv_dw, conv_dw_no_bn
def conv(in_channels, out_channels, kernel_size=3, padding=1, bn=True, dilation=1, stride=1, relu=True, bias=True):
    modules = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, bias=bias)]
    if bn:
        modules.append(nn.BatchNorm2d(out_channels))
    if relu:
        modules.append(nn.ReLU(inplace=True))
    return nn.Sequential(*modules)


def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
    return nn.Sequential(
        nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
        nn.BatchNorm2d(in_channels),
        nn.ReLU(inplace=True),

        nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True),
    )


def conv_dw_no_bn(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
    return nn.Sequential(
        nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
        nn.ELU(inplace=True),

        nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
        nn.ELU(inplace=True),
    )


class Cpm(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.align = conv(in_channels, out_channels, kernel_size=1, padding=0, bn=False)
        self.trunk = nn.Sequential(
            conv_dw_no_bn(out_channels, out_channels),
            conv_dw_no_bn(out_channels, out_channels),
            conv_dw_no_bn(out_channels, out_channels)
        )
        self.conv = conv(out_channels, out_channels, bn=False)

    def forward(self, x):
        x = self.align(x)
        x = self.conv(x + self.trunk(x))
        return x


class InitialStage(nn.Module):
    def __init__(self, num_channels, num_heatmaps, num_pafs):
        super().__init__()
        self.trunk = nn.Sequential(
            conv(num_channels, num_channels, bn=False),
            conv(num_channels, num_channels, bn=False),
            conv(num_channels, num_channels, bn=False)
        )
        self.heatmaps = nn.Sequential(
            conv(num_channels, 512, kernel_size=1, padding=0, bn=False),
            conv(512, num_heatmaps, kernel_size=1, padding=0, bn=False, relu=False)
        )
        self.pafs = nn.Sequential(
            conv(num_channels, 512, kernel_size=1, padding=0, bn=False),
            conv(512, num_pafs, kernel_size=1, padding=0, bn=False, relu=False)
        )

    def forward(self, x):
        trunk_features = self.trunk(x)
        heatmaps = self.heatmaps(trunk_features)
        pafs = self.pafs(trunk_features)
        return [heatmaps, pafs]


#Design of convolutional block for replacement convolutional block for
#replacement convolutions with 7x7 kernel size in refinement stage
#论文中图3   Figure3
class RefinementStageBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        #conv 1x1,128
        self.initial = conv(in_channels, out_channels, kernel_size=1, padding=0, bn=False)
        self.trunk = nn.Sequential(
            #conv 3x3 128
            conv(out_channels, out_channels),
            #conv 3x3 ,dil=2,128
            conv(out_channels, out_channels, dilation=2, padding=2)
        )

    def forward(self, x):
        initial_features = self.initial(x)
        trunk_features = self.trunk(initial_features)
        return initial_features + trunk_features


class RefinementStage(nn.Module):
    def __init__(self, in_channels, out_channels, num_heatmaps, num_pafs):
        super().__init__()
        self.trunk = nn.Sequential(
            RefinementStageBlock(in_channels, out_channels),
            RefinementStageBlock(out_channels, out_channels),
            RefinementStageBlock(out_channels, out_channels),
            RefinementStageBlock(out_channels, out_channels),
            RefinementStageBlock(out_channels, out_channels)
        )
        self.heatmaps = nn.Sequential(
            conv(out_channels, out_channels, kernel_size=1, padding=0, bn=False),
            conv(out_channels, num_heatmaps, kernel_size=1, padding=0, bn=False, relu=False)
        )
        self.pafs = nn.Sequential(
            conv(out_channels, out_channels, kernel_size=1, padding=0, bn=False),
            conv(out_channels, num_pafs, kernel_size=1, padding=0, bn=False, relu=False)
        )

    def forward(self, x):
        trunk_features = self.trunk(x)
        heatmaps = self.heatmaps(trunk_features)
        pafs = self.pafs(trunk_features)
        return [heatmaps, pafs]


class PoseEstimationWithMobileNet(nn.Module):
    def __init__(self, num_refinement_stages=1, num_channels=128, num_heatmaps=19, num_pafs=38):
        super().__init__()
        self.model = nn.Sequential(
            conv(     3,  32, stride=2, bias=False),
            conv_dw( 32,  64),
            conv_dw( 64, 128, stride=2),
            conv_dw(128, 128),
            conv_dw(128, 256, stride=2),
            conv_dw(256, 256),
            conv_dw(256, 512),  # conv4_2 stride of conv4_2/dw layer was removed 
            conv_dw(512, 512, dilation=2, padding=2),#conv5_1    conv5_1 dilation parameter value was set to 2
            conv_dw(512, 512),
            conv_dw(512, 512),
            conv_dw(512, 512),
            conv_dw(512, 512)   # conv5_5
        )
        self.cpm = Cpm(512, num_channels)

        self.initial_stage = InitialStage(num_channels, num_heatmaps, num_pafs)
        self.refinement_stages = nn.ModuleList()
        for idx in range(num_refinement_stages):
            self.refinement_stages.append(RefinementStage(num_channels + num_heatmaps + num_pafs, num_channels,
                                                          num_heatmaps, num_pafs))

    def forward(self, x):
        backbone_features = self.model(x)
        backbone_features = self.cpm(backbone_features)

        stages_output = self.initial_stage(backbone_features)
        for refinement_stage in self.refinement_stages:
            stages_output.extend(
                refinement_stage(torch.cat([backbone_features, stages_output[-2], stages_output[-1]], dim=1)))

        return stages_output
if __name__ == "__main__":
    print(PoseEstimationWithMobileNet())

网络结构:

PoseEstimationWithMobileNet(
  (model): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (1): Sequential(
      (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (2): Sequential(
      (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (3): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (4): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (5): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (6): Sequential(
      (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (7): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (8): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (9): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (10): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (11): Sequential(
      (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
  )
  (cpm): Cpm(
    (align): Sequential(
      (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
      (1): ReLU(inplace=True)
    )
    (trunk): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
        (1): ELU(alpha=1.0, inplace=True)
        (2): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): ELU(alpha=1.0, inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
        (1): ELU(alpha=1.0, inplace=True)
        (2): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): ELU(alpha=1.0, inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
        (1): ELU(alpha=1.0, inplace=True)
        (2): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): ELU(alpha=1.0, inplace=True)
      )
    )
    (conv): Sequential(
      (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): ReLU(inplace=True)
    )
  )
  (initial_stage): InitialStage(
    (trunk): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (heatmaps): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(512, 19, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (pafs): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(512, 38, kernel_size=(1, 1), stride=(1, 1))
      )
    )
  )
  (refinement_stages): ModuleList(
    (0): RefinementStage(
      (trunk): Sequential(
        (0): RefinementStageBlock(
          (initial): Sequential(
            (0): Conv2d(185, 128, kernel_size=(1, 1), stride=(1, 1))
            (1): ReLU(inplace=True)
          )
          (trunk): Sequential(
            (0): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
            (1): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
          )
        )
        (1): RefinementStageBlock(
          (initial): Sequential(
            (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (1): ReLU(inplace=True)
          )
          (trunk): Sequential(
            (0): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
            (1): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
          )
        )
        (2): RefinementStageBlock(
          (initial): Sequential(
            (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (1): ReLU(inplace=True)
          )
          (trunk): Sequential(
            (0): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
            (1): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
          )
        )
        (3): RefinementStageBlock(
          (initial): Sequential(
            (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (1): ReLU(inplace=True)
          )
          (trunk): Sequential(
            (0): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
            (1): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
          )
        )
        (4): RefinementStageBlock(
          (initial): Sequential(
            (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
            (1): ReLU(inplace=True)
          )
          (trunk): Sequential(
            (0): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
            (1): Sequential(
              (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
              (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (2): ReLU(inplace=True)
            )
          )
        )
      )
      (heatmaps): Sequential(
        (0): Sequential(
          (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (1): ReLU(inplace=True)
        )
        (1): Sequential(
          (0): Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
        )
      )
      (pafs): Sequential(
        (0): Sequential(
          (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (1): ReLU(inplace=True)
        )
        (1): Sequential(
          (0): Conv2d(128, 38, kernel_size=(1, 1), stride=(1, 1))
        )
      )
    )
  )
)

Mobilenet v1   结构

核心思想就是通过depthwise conv替代普通conv.

右边的Depthwise Separable convolutions  对应上面代码中的conv_dw:

def conv_dw(in_channels, out_channels, kernel_size=3, padding=1, stride=1, dilation=1):
    return nn.Sequential(
        nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, dilation=dilation, groups=in_channels, bias=False),
        nn.BatchNorm2d(in_channels),
        nn.ReLU(inplace=True),

        nn.Conv2d(in_channels, out_channels, 1, 1, 0, bias=False),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True),
    )

模型结构:

MobileNet模型结构应该如下(除第一次是普通卷积层,后面接的都是可分离卷积):

 self.model=nn.Sequential(    
            conv(3,  32, stride=2, bias=False),
            conv_dw( 32,  64),
            conv_dw( 64, 128, stride=2),
            conv_dw(128, 128),
            conv_dw(128, 256, stride=2),
            conv_dw(256, 256),
            conv_dw(256, 512, stride=2), 
            conv_dw(512, 512),
            conv_dw(512, 512),
            conv_dw(512, 512),
            conv_dw(512, 512),
            conv_dw(512, 512), 
            conv_dw(512,1024, stride=2),
            conv_dw(1024,1024)
            nn.AvgPool2d(7)
)
self.fc=nn.Linear(1024,1000)

猜你喜欢

转载自blog.csdn.net/qq_41251963/article/details/110366047