Faster-RCNN Code Interpretation 4: Interpretation of Auxiliary Documents

Faster-RCNN Code Interpretation 4: Interpretation of Auxiliary Documents

foreword

​ Because I plan to try the reproduction of Faster-RCNN recently, don't think too much, I am not good enough to reproduce all the codes by myself. Therefore, it is to refer to other people's code and perform your own interpretation.

​The code comes from the UP master of station B (big brother 666) , who put the code on GitHub, and I put the link below (it should not be considered an infringement, after all, the code is open source_ ) :

b站链接:https://www.bilibili.com/video/BV1of4y1m7nj/?vd_source=afeab8b555e5eb1bfa1e7f267262cbf2

GitHub链接:https://github.com/WZMIAOMIAO/deep-learning-for-image-processing

Purpose

​ In fact, the UP master has made a good video explaining his code, but sometimes I still like to read blogs to learn, and the video is very long, 6 hours, I tend to fall asleep when watching it_, so I plan to write Blog to record study notes.

What's done so far

​Part 1 : Detailed introduction of VOC dataset

​Part 2 : Faster-RCNN Code Interpretation 2: Getting Started Quickly

​Part Three : Faster-RCNN Code Interpretation 3: Making Your Own Data Loader

​Part Four : Interpretation of Faster-RCNN Code 4: Interpretation of Auxiliary Documents (this article)

Directory Structure

1 Introduction:

​ The main documents introduced in this article are:

split_data.py
plot_curve.py
draw_box_utils.py
train_mobilenetv2.py
backbone文件下的文件

​ These files train_mobilenetv2.pyare all auxiliary files except for them. Reading them can help us understand the code content of Faster-RCNN later.

2. split_data.py file:

​ The main function of this file: **Make your own train.txt and val.txt files. ImageSets\Main\** Among them, the train.txt and val.txt files are the train.txt and val.txt files in the VOC dataset , and the data in them are the picture names of the training set and the test set respectively.

​ In fact, this file is not very useful, because the data set has provided us with the division of the training set and the test set. However, if you use your own data set, you can use it, it can give you a reference idea.

​ Next, interpret it:

​ First, specify the path to the dataset and verify that the path exists:

# 指定数据集地址
files_path = "./VOCdevkit/VOC2012/Annotations"
assert os.path.exists(files_path), "path: '{}' does not exist.".format(files_path)

​ Then, set the ratio of training set and test set:

# 设置验证集比例
val_rate = 0.5

​ Next, separate the file prefix and suffix. This is because a feature of the VOC dataset is that the file prefixes such as comments and images of a picture are the same:

# 切割文件名: 2007_000027.xml ---- [2007_000027,xml],即获得2007_000027文件名字
files_name = sorted([file.split(".")[0] for file in os.listdir(files_path)])
# 获取总数
files_num = len(files_name)

​ Then, use random.samplerandom to take the required verification picture, the function returns the index of the verification picture, and iteratively save the training picture and the verification picture separately:

# 随机采取指定比例的数据,获取索引,并放入不同的列表中
val_index = random.sample(range(0, files_num), k=int(files_num*val_rate))
train_files = []
val_files = []
# 将上面采集的放入对应的列表中
for index, file_name in enumerate(files_name):
    # 如果索引在验证集的索引集合中
    if index in val_index:
        # 加入验证列表
        val_files.append(file_name)
    else:
        # 否则,加入训练集列表中
        train_files.append(file_name)

​ Finally, save the above content to a file:

# 将之保存到文件中
try:
    train_f = open("train.txt", "x")
    eval_f = open("val.txt", "x")
    train_f.write("\n".join(train_files))
    eval_f.write("\n".join(val_files))
except FileExistsError as e:
    print(e)
    exit(1)

3. plot_curve.py file:

​ This file is a drawing file, which draws the loss function, learning rate and mAP image .

​ In fact, this file is very simple, the main code is related to matplotlibthe use of the library, so there is no need to say anything, you can read the code comments I wrote:

def plot_loss_and_lr(train_loss, learning_rate):
    try:
        # 根据长度设置x轴的值
        x = list(range(len(train_loss)))
        fig, ax1 = plt.subplots(1, 1)   # 创建画布,注意只有一个
        ax1.plot(x, train_loss, 'r', label='loss')  # 画损失函数图
        # 美化图像
        ax1.set_xlabel("step")
        ax1.set_ylabel("loss")
        ax1.set_title("Train Loss and lr")
        plt.legend(loc='best')

        ax2 = ax1.twinx() # 启用右坐标轴
        ax2.plot(x, learning_rate, label='lr')  # 画学习率图
        ax2.set_ylabel("learning rate")
        ax2.set_xlim(0, len(train_loss))  # 设置横坐标整数间隔
        plt.legend(loc='best')

        handles1, labels1 = ax1.get_legend_handles_labels()  # 返回图例的句柄和标签,比如 legend为 loss,那么l就为loss
        handles2, labels2 = ax2.get_legend_handles_labels()
        plt.legend(handles1 + handles2, labels1 + labels2, loc='upper right')

        fig.subplots_adjust(right=0.8)  # 防止出现保存图片显示不全的情况
        # 保存图像
        fig.savefig('./loss_and_lr{}.png'.format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S")))
        plt.close()
        print("successful save loss curve! ")
    except Exception as e:
        print(e)


def plot_map(mAP):
    try:
        # 根据长度设置x轴的值
        x = list(range(len(mAP)))
        plt.plot(x, mAP, label='mAp')   # 画mAP图
        # 美化图像
        plt.xlabel('epoch')
        plt.ylabel('mAP')
        plt.title('Eval mAP')
        plt.xlim(0, len(mAP))
        plt.legend(loc='best')
        # 保存
        plt.savefig('./mAP.png')
        plt.close()
        print("successful save mAP curve!")
    except Exception as e:
        print(e)

​ Let me talk about the codes involved in the above, but we generally don't use the code.

  • ax1.twinx(): The function is to enable the right axis.

​ If you have used origin or know a little about drawing, you should know the meaning of the right axis (see the figure below), but you may not have seen this function, its function is to enable the right axis and return an operation object.

insert image description here

  • ax1.get_legend_handles_labels(): The function is to return the legend handle (i.e. the operation object) and the label (i.e. the content of the legend) of this coordinate axis

​ This is difficult to understand, but it is simple with an example. For example, the upper right corner of the above picture 第一条曲线is a legend, and its content/label value is 第一条曲线. The handle is the operation object . If the returned object is empty, it means that the graph does not use a legend.

4. draw_box_utils.py file:

​ The main function of this file is to draw the bounding box, category information and mask information of the image .

​ First, look at the bottom function draw_objs:

draw_objs function

​Function : Draw the bounding box and mask of all objects .

​ Input parameters:

parameter significance
image picture to be drawn
boxes Target bounding box information
classes target category information
scores target probability information
masks Target mask information
category_index Class and Name Dictionary
box_thresh Filter probability threshold, default is 0.1
mask_thresh Same as above, only the filtered object is mask, the default is 0.5
line_thickness bounding box width
font font type
font_size font size
draw_boxes_on_image Whether to draw the bounding box on the image, the default is True
draw_masks_on_image Whether to draw the mask on the image, the default is Fasle

First, filter out those bounding boxes with low probability values:

# 过滤掉低概率的目标
idxs = np.greater(scores, box_thresh)
# 需要同时处理boxes、classes、scores、masks
boxes = boxes[idxs]
classes = classes[idxs]
scores = scores[idxs]
if masks is not None:
    masks = masks[idxs]

​ Next, after judging whether to filter out all of them, if they are all filtered out, there is no need to draw:

# 如果boxes长度为0,表示所有的框都过滤了,就不需要画了
if len(boxes) == 0:
    return image

​ Then, randomly extract colors from the defined color list to generate a list of colors to be used:

# 从定义的颜色列表中抽取颜色 
# ImageColor.getrgb 获取颜色的rgb值
colors = [ImageColor.getrgb(STANDARD_COLORS[cls % len(STANDARD_COLORS)]) for cls in classes]

​ Next, start to draw the bounding box, just read the comments:

# 如果需要画边界框
if draw_boxes_on_image:
    # 创建画图对象
    draw = ImageDraw.Draw(image)
    # 开始迭代绘图,因为一张图不知一个对象,所以需要画出所有的框
    for box, cls, score, color in zip(boxes, classes, scores, colors):
        # 边界框的坐标
        left, top, right, bottom = box
        # 绘制目标边界框,顺时针画图
        draw.line([(left, top), (left, bottom), (right, bottom),
                   (right, top), (left, top)], width=line_thickness, fill=color)
        # 绘制类别和概率信息
        draw_text(draw, box.tolist(), int(cls), float(score), category_index, color, font, font_size)

​ Finally, draw the mask:

if draw_masks_on_image and (masks is not None):
	# 画出所有的mask
    image = draw_masks(image, masks, colors, mask_thresh)

​ Among them, the above draw_textand draw_masksare the other two functions in the file, which will be explained below.

draw_text function

​Function : Draw the target bounding box and category information on the picture, which is draw_objan auxiliary function .

​ Input parameters:

parameter significance
draw Drawing objects, you can use methods such as drawing straight lines
box A bounding box with coordinate information inside
cls The category of the object, which is an int value and needs to be converted to a string value using category_index
score The class probability value of the object
category_index Category information corresponding to different indexes
color color used
font font
font_size word size

​ First, because you need to draw text, you need to create a text object:

# 创建字体对象,如果创建失败(比如作者用的字体你没有),就使用默认的字体
try:
    font = ImageFont.truetype(font, font_size)
except IOError:
	font = ImageFont.load_default()

​ Next, get the coordinate information of the bounding box and set where the text should be displayed in the bounding box:

# 获取坐标
left, top, right, bottom = box
# 将数字的类别转为真实的类别信息,并加上概率值构成“ person 99% ”这样的字符串
display_str = f"{
      
      category_index[str(cls)]}: {
      
      int(100 * score)}%"
# 设置字体的高度
display_str_heights = [font.getsize(ds)[1] for ds in display_str]
display_str_height = (1 + 2 * 0.05) * max(display_str_heights)

# 如果文字的高度没有超过图像最高点
if top > display_str_height:
    # 设置文字的坐标
    text_top = top - display_str_height
    text_bottom = top
else:
	# 如果超过了,就设置文字的坐标为边界框的下面
	text_top = bottom
	text_bottom = bottom + display_str_height

​ Finally, draw the bounding box and text on the image:

 # 开始画
for ds in display_str:
	# 获取文字的宽和高
	text_width, text_height = font.getsize(ds)
	margin = np.ceil(0.05 * text_width)
    # 画一个矩形
    draw.rectangle([(left, text_top),
                    (left + text_width + 2 * margin, text_bottom)], fill=color)
    # 画文字
    draw.text((left + margin, text_top),
              ds,
              fill='black',
              font=font)
    left += text_width

draw_masks function:

​ This function is relatively simple, just read the comments:

def draw_masks(image, masks, colors, thresh: float = 0.7, alpha: float = 0.5):
    # 将图像转为array值
    np_image = np.array(image)
    # 过滤下mask
    masks = np.where(masks > thresh, True, False)

    # colors = np.array(colors)
    img_to_draw = np.copy(np_image)
    # TODO: There might be a way to vectorize this
    # 将mask区域改变颜色
    for mask, color in zip(masks, colors):
        img_to_draw[mask] = color
    
    out = np_image * (1 - alpha) + img_to_draw * alpha
    # 最后,将array转为图像
    return fromarray(out.astype(np.uint8))

5. Files under the backbone folder:

​ The contents of this folder are the contents of the **backbone CNN architecture. ** There are four main files under it:

resnet50+fpn
vgg
mobilenetv2
feature-pyramid-network

​ There is actually nothing to say about these four files, because they are all implemented according to the network architecture, which is not very important to us. Of course, if you are interested, you can find the corresponding architecture diagram on the Internet, and then refer to the code to implement it yourself, it is all possible.

6. train_mobilenetv2.py file:

​ This file and train_res50_fpn.pycontent are the same, and the code is generally similar. Its function is to train the CNN architecture under the backbone folder . Here train_mobilenetv2.pyI will explain the specific content.

main function

​ First of all, it must specify some parameter variables, such as specifying the GPU, specifying whether there is a weight saving folder, specifying the preprocessing party, specifying the data set used, batch_size, etc.:

# 指定GPU设备
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("Using {} device training.".format(device.type))

# 用来保存coco_info的文件
# coco_info文件:
results_file = "results{}.txt".format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

# 检查保存权重文件夹是否存在,不存在则创建
if not os.path.exists("save_weights"):
    os.makedirs("save_weights")

# 指定数据增强方式,即随机水平翻转(框和图片都要翻转)
data_transform = {
    
    
    "train": transforms.Compose([transforms.ToTensor(),
                                 transforms.RandomHorizontalFlip(0.5)]),
    "val": transforms.Compose([transforms.ToTensor()])
}

# 指定VOC数据集地址----需要修改
VOC_root = "./"  # VOCdevkit
aspect_ratio_group_factor = 3
batch_size = 8 # batch size大小
amp = False  # 是否使用混合精度训练,需要GPU支持

# 检查VOC数据集是否存在,否则报错
if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False:
    raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root))

Next, define the dataset and the loader of the dataset (see comments):

# 加载数据集,使用我们自己定义的加载器来加载
# VOCdevkit -> VOC2012 -> ImageSets -> Main -> train.txt
train_dataset = VOCDataSet(VOC_root, "2012", data_transform["train"], "train.txt")
train_sampler = None

# 是否按图片相似高宽比采样图片组成batch
# 使用的话能够减小训练时所需GPU显存,默认使用
if aspect_ratio_group_factor >= 0:
    train_sampler = torch.utils.data.RandomSampler(train_dataset)
    # 统计所有图像高宽比例在bins区间中的位置索引
    group_ids = create_aspect_ratio_groups(train_dataset,k=aspect_ratio_group_factor)
    # 每个batch图片从同一高宽比例区间中取
    train_batch_sampler = GroupedBatchSampler(train_sampler, group_ids, batch_size)

# 使用多少个线程去加载图片
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])
print('Using %g dataloader workers' % nw)

# 注意这里的collate_fn是自定义的,因为读取的数据包括image和targets,不能直接使用默认的方法合batch
if train_sampler:
    # 如果按照图片高宽比采样图片,dataloader中需要使用batch_sampler
    train_data_loader = torch.utils.data.DataLoader(train_dataset,
                                                   batch_sampler=train_batch_sampler, # 与sampler类似,但是一次只返回一个batch的indices(索引),需要注意的是,一旦指定了这个参数,那么batch_size,shuffle,sampler,drop_last就不能再制定了(互斥——Mutually exclusive)
                                                    pin_memory=True, # 如果设置为True,那么data loader将会在返回它们之前,将tensors拷贝到CUDA中的固定内存(CUDA pinned memory)中
                                                    num_workers=nw,  # 多线程读取数据
                                                    collate_fn=train_dataset.collate_fn)  #  将一个list的sample组成一个mini-batch的函数
else:
    train_data_loader = torch.utils.data.DataLoader(train_dataset,
                                                    batch_size=batch_size,
                                                    shuffle=True,
                                                    pin_memory=True,
                                                    num_workers=nw,
                                                    collate_fn=train_dataset.collate_fn)

# 加载验证集数据集
# VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt
val_dataset = VOCDataSet(VOC_root, "2012", data_transform["val"], "val.txt")
val_data_loader = torch.utils.data.DataLoader(val_dataset,
                                              batch_size=1,
                                              shuffle=False,
                                              pin_memory=True,
                                              num_workers=nw,
                                              collate_fn=val_dataset.collate_fn)

​ Then, define the model and put the model into the GPU, and define some variables by the way: ( note that the model at this time is the Faster-RCNN model, but the CNN architecture inside is mobilenetv2 )

# 创建模型,类别为固定的20+一个背景
model = create_model(num_classes=21)
# print(model)

# 放入GPU中
model.to(device)

# 梯度缩放,即有些梯度很小,计算机无法存储完,就会下溢,这时将梯度放大,即可存储下来
scaler = torch.cuda.amp.GradScaler() if amp else None

# 定义一些变量,主要用于后面的画图
train_loss = [] # 训练损失
learning_rate = []  # 学习率
val_map = []    # 验证集的mAP值

​The next step is the highlight, which is to train the network . The author adopts the following training ideas here: first, freeze the weight of the CNN architecture (that is, let this part not ask for gradients), and then use it to train the RPN network. At this stage, only 5 epochs are trained. Then, unfreeze the weights of the CNN architecture and start training the entire network. It should be noted here that the author believes that the first few layers of the CNN architecture are public, and there is not much data, so the first few layers do not participate in training, that is, they freeze the weights.

​ With this idea, it is very simple to look at the code. For details, you can see the comments:

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
#  first frozen backbone and train 5 epochs                   #
#  首先冻结前置特征提取网络权重(backbone),训练rpn以及最终预测网络部分 #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

# 不求backbone的梯度,即不调节它们
for param in model.backbone.parameters():
    param.requires_grad = False

# define optimizer
# 确定要优化的参数
params = [p for p in model.parameters() if p.requires_grad]
# 定义优化器
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

# 在前5个epoch训练对后面的参数微调
init_epochs = 5
for epoch in range(init_epochs):
    # train for one epoch, printing every 10 iterations
    mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader,
                                          device, epoch, print_freq=50,
                                          warmup=True, scaler=scaler)
    train_loss.append(mean_loss.item())
    learning_rate.append(lr)

    # 在测试集上验证
    coco_info = utils.evaluate(model, val_data_loader, device=device)

    # 训练信息写入文件
    with open(results_file, "a") as f:
        # 写入的数据包括coco指标还有loss和learning rate
        result_info = [f"{
      
      i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{
      
      lr:.6f}"]
        txt = "epoch:{} {}".format(epoch, '  '.join(result_info))
        f.write(txt + "\n")

	val_map.append(coco_info[1])  # pascal mAP
    
# 保存权重
torch.save(model.state_dict(), "./save_weights/pretrain.pth")

# # # # # # # # # # # # # # # # # # # # # # # # # # # #
#  second unfrozen backbone and train all network     #
#  解冻前置特征提取网络权重(backbone),接着训练整个网络权重  #
# # # # # # # # # # # # # # # # # # # # # # # # # # # #

# 冻结backbone部分底层权重:认为前面几层是公用的特征+data很少,训练整个网络不够,因此冻结部分层(官方实现方法)
for name, parameter in model.backbone.named_parameters():
    split_name = name.split(".")[0]
    if split_name in ["0", "1", "2", "3"]:
        parameter.requires_grad = False # 冻结
	else:
		parameter.requires_grad = True  # 解冻

# 确定哪些参数需要训练
params = [p for p in model.parameters() if p.requires_grad]
# 定义优化器
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)
# 学习调整方法,每三次调整一次,乘以0.33
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.33)
# 开始训练,调整参数
num_epochs = 20
for epoch in range(init_epochs, num_epochs+init_epochs, 1):
    # 开始训练一个epoch,每50次迭代打印依次损失值
    mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader,
                                          device, epoch, print_freq=50,
                                          warmup=True, scaler=scaler)
    # 保存平均损失和当前学习率
    train_loss.append(mean_loss.item())
    learning_rate.append(lr)

    # 更新学习率
    lr_scheduler.step()

    # 在测试集上验证
    coco_info = utils.evaluate(model, val_data_loader, device=device)

    # 将训练信息写入文件
    with open(results_file, "a") as f:
        # 写入的数据包括coco指标还有loss和learning rate
        result_info = [f"{
      
      i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{
      
      lr:.6f}"]
        txt = "epoch:{} {}".format(epoch, '  '.join(result_info))
        f.write(txt + "\n")

	val_map.append(coco_info[1])  # pascal mAP

	# 仅保存最后5个epoch的权重
	# 还需要保存一些优化器、学习率等的参数
    if epoch in range(num_epochs+init_epochs)[-5:]:
        save_files = {
    
    
            'model': model.state_dict(),
            'optimizer': optimizer.state_dict(),
            'lr_scheduler': lr_scheduler.state_dict(),
            'epoch': epoch}
        torch.save(save_files, "./save_weights/mobile-model-{}.pth".format(epoch))

​ Finally, just draw the loss function, learning rate and mAP image:

# 画损失函数图和学习率图
if len(train_loss) != 0 and len(learning_rate) != 0:
    from plot_curve import plot_loss_and_lr
    plot_loss_and_lr(train_loss, learning_rate)

# 画mAP图
if len(val_map) != 0:
    from plot_curve import plot_map
    plot_map(val_map)

create_model function:

​ This function is mainthe creation model function in the above function.

​ This function involves many other functions, which I will explain in the next article. Here is just an overview:

def create_model(num_classes):
    # https://download.pytorch.org/models/vgg16-397923af.pth
    # 如果使用vgg16的话就下载对应预训练权重并取消下面注释,接着把mobilenetv2模型对应的两行代码注释掉
    # vgg_feature = vgg(model_name="vgg16", weights_path="./backbone/vgg16.pth").features
    # backbone = torch.nn.Sequential(*list(vgg_feature._modules.values())[:-1])  # 删除features中最后一个Maxpool层
    # backbone.out_channels = 512
    
    # 拥有预训练权重
    # https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
    # backbone采用mobilenetv2
    backbone = MobileNetV2(weights_path="./backbone/mobilenet_v2.pth").features
    # 设置输出数目
    backbone.out_channels = 1280  # 设置对应backbone输出特征矩阵的channels
    
    # 生成anchor
    # size即尺寸
    # aspect_ratios即缩放因子
    anchor_generator = AnchorsGenerator(sizes=((32, 64, 128, 256, 512),),
                                        aspect_ratios=((0.5, 1.0, 2.0),))
    
    # 生成roi
    roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],  # 在哪些特征层上进行roi pooling
                                                    output_size=[7, 7],   # roi_pooling输出特征矩阵尺寸
                                                    sampling_ratio=2)  # 采样率
    # 创建Faster-RCNN模型,并指定相关参数
    model = FasterRCNN(backbone=backbone,
                       num_classes=num_classes,
                       rpn_anchor_generator=anchor_generator,
                       box_roi_pool=roi_pooler)

    return model

​ It should be noted here that AnchorsGeneratorthe size parameter in has five values, while there are only three values ​​in the original paper 128, 256, 512, plus three scaling factors, 3*3=9 anchors are generated. Here are five values, which were changed by the author, which may be convenient for the detection of small targets.

7. Summary:

​ Among the files interpreted above, the most important thing is train_mobilenetv2.pythe file. This file is the file for training Faster-RCNN, but the CNN architecture used is mobilenetV2.

​ In addition, it is worth mentioning that if you want to debug the code and view the value of variables later, you need to run this file.

Guess you like

Origin blog.csdn.net/weixin_46676835/article/details/130168575