Preface: [series] from scratch learning YOLOv3 write more and more, the content had been scheduled relatively small, but in the process of reading the code slowly discover some new bright spots, so keep adding to this series. Before reading the code in the YOLOv3, we have learned cfg file, model construction and so on. Herein before, based on the model code is modified, the previous series Attention SE module and the module into YOLOv3 the CBAM.
1. prescribed format
Just as [convolutional]
, [maxpool]
, [net]
, [route]
and other layers defined in the cfg, we add new modules when you want to look at the provisions cfg format. Make the following provisions:
(Explain in detail see: the SE module [cv in the Attention mechanism] the simplest and most easily implemented SE module ), there is a parameter reduction
, the parameter default is 16, so the detailed parameters in this module we follow the following setting:
[se]
reduction=16
In CBAM module (explain in detail see: [Attention] mechanism CV in 2018 Convolutional Block Attention Module ECCV ), the spatial attention mechanisms and passages attention mechanisms in the presence of a total of two parameters: ratio
and kernel_size
so such a requirement CBAM in cfg file format:
[cbam]
ratio=16
kernelsize=7
2. Modify the analytical part
Since we added these parameters are customizable, so you need to modify cfg file parsing function, talked about before, we need to modify parse_config.py
part of:
def parse_model_cfg(path):
# path参数为: cfg/yolov3-tiny.cfg
if not path.endswith('.cfg'):
path += '.cfg'
if not os.path.exists(path) and \
os.path.exists('cfg' + os.sep + path):
path = 'cfg' + os.sep + path
with open(path, 'r') as f:
lines = f.read().split('\n')
# 去除以#开头的,属于注释部分的内容
lines = [x for x in lines if x and not x.startswith('#')]
lines = [x.rstrip().lstrip() for x in lines]
mdefs = [] # 模块的定义
for line in lines:
if line.startswith('['): # 标志着一个模块的开始
'''
eg:
[shortcut]
from=-3
activation=linear
'''
mdefs.append({})
mdefs[-1]['type'] = line[1:-1].rstrip()
if mdefs[-1]['type'] == 'convolutional':
mdefs[-1]['batch_normalize'] = 0
else:
key, val = line.split("=")
key = key.rstrip()
if 'anchors' in key:
mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))
else:
mdefs[-1][key] = val.strip()
# Check all fields are supported
supported = ['type', 'batch_normalize', 'filters', 'size',\
'stride', 'pad', 'activation', 'layers', \
'groups','from', 'mask', 'anchors', \
'classes', 'num', 'jitter', 'ignore_thresh',\
'truth_thresh', 'random',\
'stride_x', 'stride_y']
f = [] # fields
for x in mdefs[1:]:
[f.append(k) for k in x if k not in f]
u = [x for x in f if x not in supported] # unsupported fields
assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)
return mdefs
Above, the need to change is supported in the field, our content will be added to it:
supported = ['type', 'batch_normalize', 'filters', 'size',\
'stride', 'pad', 'activation', 'layers', \
'groups','from', 'mask', 'anchors', \
'classes', 'num', 'jitter', 'ignore_thresh',\
'truth_thresh', 'random',\
'stride_x', 'stride_y',\
'ratio', 'reduction', 'kernelsize']
3. To achieve SE and CBAM
See also specific principles [cv] mechanism of Attention simplest and most easily implemented SE modules and [Attention] mechanism CV in ECCV 2018 Convolutional Block Attention Module These two articles, using code directly below the above two articles:
I KNOW
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
CBAM
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super(SpatialAttention, self).__init__()
assert kernel_size in (3,7), "kernel size must be 3 or 7"
padding = 3if kernel_size == 7else1
self.conv = nn.Conv2d(2,1,kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avgout = torch.mean(x, dim=1, keepdim=True)
maxout, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avgout, maxout], dim=1)
x = self.conv(x)
return self.sigmoid(x)
class ChannelAttention(nn.Module):
def __init__(self, in_planes, rotio=16):
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)
self.sharedMLP = nn.Sequential(
nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(),
nn.Conv2d(in_planes // rotio, in_planes, 1, bias=False))
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avgout = self.sharedMLP(self.avg_pool(x))
maxout = self.sharedMLP(self.max_pool(x))
return self.sigmoid(avgout + maxout)
Two or more modules of code is added to models.py
the file.
4. Design cfg file
Here to yolov3-tiny.cfg
as the baseline, and then add the attention mechanism module.
Similarly CBAM and SE, the SE to, for example, added to the part after the Backbone, the information reconstruction (refinement).
[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[se]
reduction=16
# 在backbone结束的地方添加se模块
#####backbone######
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear
[yolo]
mask = 0,1,2
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
5. Model Construction
These are preparations to SE, for example, we modify the model.py
loaded part of the model file, and modify the code forward part of the function, let it function properly:
In model.py
the create_modules
function, add:
elif mdef['type'] == 'se':
modules.add_module(
'se_module',
SELayer(output_filters[-1], reduction=int(mdef['reduction'])))
Then modify the function in the forward portion Darknet:
def forward(self, x, var=None):
img_size = x.shape[-2:]
layer_outputs = []
output = []
for i, (mdef,
module) in enumerate(zip(self.module_defs, self.module_list)):
mtype = mdef['type']
if mtype in ['convolutional', 'upsample', 'maxpool']:
x = module(x)
elif mtype == 'route':
layers = [int(x) for x in mdef['layers'].split(',')]
if len(layers) == 1:
x = layer_outputs[layers[0]]
else:
try:
x = torch.cat([layer_outputs[i] for i in layers], 1)
except: # apply stride 2 for darknet reorg layer
layer_outputs[layers[1]] = F.interpolate(
layer_outputs[layers[1]], scale_factor=[0.5, 0.5])
x = torch.cat([layer_outputs[i] for i in layers], 1)
elif mtype == 'shortcut':
x = x + layer_outputs[int(mdef['from'])]
elif mtype == 'yolo':
output.append(module(x, img_size))
layer_outputs.append(x if i in self.routs else [])
SE module is added in the forward, it is actually very simple. SE convolution module layer, the sample, the maximum cell layer is the same position, no further operation, only need to modify the code portion of the above:
for i, (mdef,
module) in enumerate(zip(self.module_defs, self.module_list)):
mtype = mdef['type']
if mtype in ['convolutional', 'upsample', 'maxpool', 'se']:
x = module(x)
CBAM overall process is similar, you can try it yourself, the way to familiarize yourself with the overall process of YOLOv3.
Postscript: The content of this article is very simple, just add the attention of the module, it is easy to achieve. But the location of the specific mechanism of attention, so put the number of modules needed to do experiments to verify. Attention mechanism is not one size fits all, need to bring Senate to try to get satisfactory results. Welcome to contact me to join a group chat, feedback effects on the respective data sets.
ps: the recent attention to the body, go out wearing masks.