特征金字塔:FPN
出现问题:
为了增强语义性,传统的物体检测模型通常只在深度卷积网络的最后一个特征图上进行后续操作,而这一层对应的下采样率(图像缩小的倍数)通常又比较大,如16、32,造成小物体在特征图上的有效信息较少,小物体的检测性能会急剧下降,这个问题也被称为多尺度问题。
解决方案:
解决多尺度问题的关键在于如何提取多尺度的特征。传统的方法有图像金字塔(Image Pyramid)主要思路是将输入图片做成多个尺度,不同尺度的图像生成不同尺度的特征
缺点:非常耗时,计算量也很大
FPN的总体结构
图1
主要包含自下而上网络,自上而下网络,横向连接与卷积融合4个部分
- 自下而上:最左侧为普通的卷积网络,默认使用ResNet结构,用作提取语义信息。C1代表了ResNet的前几个卷积与池化层,而C2至C5分别为不同的ResNet卷积组,这些卷积组包含了多个Bottleneck结构,组内的特征图大小相同,组间大小递减。
- 自上而下:首先对C5进行1×1卷积降低通道数得到P5,然后依次进行上采样得到P4、P3和P2,目的是得到与C4、C3与C2长宽相同的特征,以方便下一步进行逐元素相加。这里采用2倍最邻近上采样,即直接对临近元素进行复制,而非线性插值。
- 横向连接(Lateral Connection):目的是为了将上采样后的高语义特征与浅层的定位细节特征进行融合。高语义特征经过上采样后,其长宽与对应的浅层特征相同,而通道数固定为256,因此需要对底层特征C2至C4进行11卷积使得其通道数变为256,然后两者进行逐元素相加得到P4、P3与P2。由于C1的特征图尺寸较大且语义信息不足,因此没有把C1放到横向连接中。
- 卷积融合:在得到相加后的特征后,利用3×3卷积对生成的P2至P4再进行融合,目的是消除上采样过程带来的重叠效应,以生成最终的特征图。
代码:
Pytorch利用 F.interpolate 进行上采样操作
import torch.nn as nn
import torch.nn.functional as F
import math
import torch
#ResNet的基本Bottleneck类
class Bottleneck(nn.Module):
expansion = 4 #通道倍增数
def __init__(self,in_planes,planes,stride=1,downsample=None):
super(Bottleneck,self).__init__()
self.bottleneck = nn.Sequential(
nn.Conv2d(in_planes,planes,1,bias=False),
nn.BatchNorm2d(planes),
nn.ReLU(inplace=True),
nn.Conv2d(planes,planes,3,stride,1,bias=False),
#stride = 1 padding = 1 kernal_size = 3
nn.BatchNorm2d(planes),
nn.ReLU(inplace=True),
nn.Conv2d(planes,self.expansion*planes,1,bias=False),
nn.BatchNorm2d(self.expansion*planes)
)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
def forward(self,x):
identity = x
out = self.bottleneck(x)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class FPN(nn.Module):
def __init__(self,layers):
super(FPN,self).__init__()
self.inplaces = 64
#处理输入的C1模块
self.conv1 = nn.Conv2d(3,64,7,2,3,bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(3,2,1)
#自下而上的C2,C3,C4,C5
self.layer1 = self._make_layer(64,layers[0])
self.layer2 = self._make_layer(128,layers[1],2)
self.layer3 = self._make_layer(256,layers[2],2)
self.layer4 = self._make_layer(512,layers[3],2)
#对C5减少通道数得到P5
self.toplayer = nn.Conv2d(2048,256,1,1,0)
# 3x3卷积融合特征
self.smooth1 = nn.Conv2d(256,256,3,1,1)
self.smooth2 = nn.Conv2d(256,256,3,1,1)
self.smooth3 = nn.Conv2d(256,256,3,1,1)
#横向连接,保证通道数相同
self.latlayer1 = nn.Conv2d(1024,256,1,1,0)
self.latlayer2 = nn.Conv2d(512,256,1,1,0)
self.latlayer3 = nn.Conv2d(256,256,1,1,0)
#构建C2-C5 注意区分stride值为1和2的情况
def _make_layer(self,planes,blocks,stride=1):
downsample = None
if stride != 1 or self.inplaces!=Bottleneck.expansion*planes:
downsample = nn.Sequential(
nn.Conv2d(self.inplaces,Bottleneck.expansion*planes,1,stride,bias=False),
nn.BatchNorm2d(Bottleneck.expansion*planes)
)
layers = []
layers.append(Bottleneck(self.inplaces,planes,stride,downsample))
self.inplaces = planes * Bottleneck.expansion
for i in range(1,blocks):
layers.append(Bottleneck(self.inplaces,planes))
return nn.Sequential(*layers)
def _upsample_add(slef,x,y):
_,_,H,W = y.shape
return F.interpolate(x,size=(H,W),mode='bilinear',align_corners=True)+y
def forward(self,x):
#自下而上
c1 = self.maxpool(self.relu(self.bn1(self.conv1(x))))
c2 = self.layer1(c1)
c3 = self.layer2(c2)
c4 = self.layer3(c3)
c5 = self.layer4(c4)
#喜上而下
p5 = self.toplayer(c5)
p4 = self._upsample_add(p5,self.latlayer1(c4))
p3 = self._upsample_add(p4,self.latlayer2(c3))
p2 = self._upsample_add(p3,self.latlayer3(c2))
p4 = self.smooth1(p4)
p3 = self.smooth2(p3)
p2 = self.smooth3(p2)
return p2,p3,p4,p5
if __name__ == "__main__":
#利用list来初始化FPN网络
net = FPN([3,4,6,3])
print(net.conv1)
print(net.maxpool)
print(net.layer1)
print(net.layer2)
print(net.toplayer)
print(net.smooth1)
print(net.latlayer1)
inputs = torch.randn(1,3,224,224)
output = net(inputs)
print(output[0].size())
print(output[1].size())
print(output[2].size())
print(output[3].size())
结果:
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
Sequential(
(0): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
)
Sequential(
(0): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(bottleneck): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(relu): ReLU(inplace=True)
)
)
Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
torch.Size([1, 256, 56, 56])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 14, 14])
torch.Size([1, 256, 7, 7])