PyTorch框架下如何一步一步用自己的标注数据搭建和使用Faster R-CNN

  • 使用篇(关于Faster RCNN网络的原理,博客里有很详细的说明)

  • 1. 环境配置:除了pytorch1.5以上版本,还需要使用coco的评价准则,所以需要装pycocotools,最好是使用本地安装命令,下载好pycocotools本地安装包后,在cmd中输入:

cd c:\users\INNYI\cocoapi-master\pythonAPI
python setup.py build_ext install(要删除setup.py里的’Wno-cpp和Wno-unused-function’,)

如果报错,就说明设备缺少C++的编译环境。另外,一定要采用GPU训练,目标检测的网络和分类网络完全就不是一个层次,训练起来非常的耗时。

  • 2. 文件结构:由于官方提供了resnet50+fpn的预训练模型权重,因此以其作为网络的backbone效果要比Mobilenet好的多;pascal_voc_classes.json是pascal数据集的类别信息,因为在目标检测当中,索引0一般是专门留给背景的,因此其索引是从1开始。

在这里插入图片描述

在这里插入图片描述

  • 3. 确保下载好预训练模型权重,且下载好后须放到backbone文件夹中,权重下载地址:

MobileNetV2 backbone: https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
ResNet50+FPN backbone: https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

  • 4. 先不要管模型的搭建及代码的细节,先把下面的训练脚本和预测脚本读懂,注释很关键(backbone是res50+fpn):

import os
import torch
import transforms
from network_files.faster_rcnn_framework import FasterRCNN, FastRCNNPredictor
from backbone.resnet50_fpn_model import resnet50_fpn_backbone
from my_dataset import VOC2012DataSet
from train_utils import train_eval_utils as utils

def create_model(num_classes, device):#定义整个FasterRCNN网络
    backbone = resnet50_fpn_backbone()#该语句会自动冻结底层部分层权重,因此就不需要再执行冻结操作了
    #注:冻结部分层作用:加速网络训练而且从结果来看比训练整个模型的权重效果要好(pascalvoc数据集5000张太少了)
    # 训练自己数据集时不要修改这里的91,修改的是传入的num_classes参数
    model = FasterRCNN(backbone=backbone, num_classes=91)
    # 载入预训练模型权重
    weights_dict = torch.load("./backbone/fasterrcnn_resnet50_fpn_coco.pth", map_location=device)
    missing_keys, unexpected_keys = model.load_state_dict(weights_dict, strict=False)
    if len(missing_keys) != 0 or len(unexpected_keys) != 0:
        print("missing_keys: ", missing_keys)
        print("unexpected_keys: ", unexpected_keys)
    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model


def main(parser_data):
    device = torch.device(parser_data.device if torch.cuda.is_available() else "cpu")
    print("Using {} device training.".format(device.type))
    #定义图像预处理操作--和分类网络的预处理方式不同,这里的水平翻转对应的GTbox也要水平反转
    data_transform = {
    
    
        "train": transforms.Compose([transforms.ToTensor(),
                                     transforms.RandomHorizontalFlip(0.5)]),
        "val": transforms.Compose([transforms.ToTensor()])
    }
    VOC_root = parser_data.data_path
    # check voc root
    if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False:
        raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root))
    # 加载训练数据集
    train_data_set = VOC2012DataSet(VOC_root, data_transform["train"], "train.txt")#VOC2012DataSet是mudayaset.py文件的函数
    batch_size = parser_data.batch_size
    train_data_loader = torch.utils.data.DataLoader(train_data_set,
                                                    batch_size=batch_size,
                                                    shuffle=True,
                                                    num_workers=0,
                                                    collate_fn=train_data_set.collate_fn)
     #如果不定义这个方法,则数据默认是通过torch.stack进行拼接,对于之前的分类网络,每一个train_data_set元素就是一张图像,就是一个tensor,通过torch.stack就得到了一个个的batch,如果这里还采用torch.stack的话就会出错,因为元素是一个元组类型,所以这里需要自定义一个方法,这个方法非常的简单

    # 加载验证集
    val_data_set = VOC2012DataSet(VOC_root, data_transform["val"], "val.txt")
    val_data_set_loader = torch.utils.data.DataLoader(val_data_set,
                                                      batch_size=batch_size,
                                                      shuffle=False,
                                                      num_workers=0,
                                                      collate_fn=train_data_set.collate_fn)

    # 实例化模型: num_classes equal background + 20 classes
    model = create_model(num_classes=21, device=device)#背景+分类的类别个数
    # print(model)
    model.to(device)
    #由于res50+fpn包含了整个fasterrcnn的权重,所以这里就不需要冻结backbone,直接训练即可
    # 通过requires_grad属性找到需要训练的所有参数,
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,#将参数传入优化器中
                                momentum=0.9, weight_decay=0.0005)

    # 设置学习率的路线:每隔一定步数降低学习率(每训练5步,就将学习率*0.33)----当然还有很多调整学习率的方法
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=5,
                                                   gamma=0.33)
    # 如果传入了resume这个参数,则接着上次结果接着训练,否则从头开始训练
    if parser_data.resume != "":
        checkpoint = torch.load(parser_data.resume, map_location=device)
        model.load_state_dict(checkpoint['model'])
        optimizer.load_state_dict(checkpoint['optimizer'])
        lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])
        parser_data.start_epoch = checkpoint['epoch'] + 1
        print("the training process from epoch{}...".format(parser_data.start_epoch))
    train_loss = []
    learning_rate = []
    val_mAP = []
    for epoch in range(parser_data.start_epoch, parser_data.epochs):
        # train for one epoch, printing every 10 iterations
        utils.train_one_epoch(model, optimizer, train_data_loader,
                              device, epoch, train_loss=train_loss, train_lr=learning_rate,
                              print_freq=50, warmup=True)
        # 每迭代一个epoch,就将学习率的策略调整执行一个step方法,记录执行了几步,每隔5步就会降低一次学习率
        lr_scheduler.step()
        # evaluate on the test dataset
        utils.evaluate(model, val_data_set_loader, device=device, mAP_list=val_mAP)
        # 保存权重:这里保存的权重不光指模型的权重,还保存了优化器的状态和学习策略的状态及当前epoch数,这样如果想接着上次的训练就可以直接载入文件了
        save_files = {
    
    
            'model': model.state_dict(),
            'optimizer': optimizer.state_dict(),
            'lr_scheduler': lr_scheduler.state_dict(),
            'epoch': epoch}
        torch.save(save_files, "./save_weights/resNetFpn-model-{}.pth".format(epoch))
    # plot loss and lr curve
    if len(train_loss) != 0 and len(learning_rate) != 0:
        from plot_curve import plot_loss_and_lr
        plot_loss_and_lr(train_loss, learning_rate)
    # plot mAP curve
    if len(val_mAP) != 0:
        from plot_curve import plot_map
        plot_map(val_mAP)
if __name__ == "__main__":
    version = torch.version.__version__[:5]
    # 因为使用的官方的混合精度训练是1.6.0后才支持的,所以必须大于等于1.6.0
    if version < "1.6.0":
        raise EnvironmentError("pytorch version must be 1.6.0 or above")

    import argparse
    #写以下代码的目的():可以在命令行窗口可以直接调用脚本并传入设备参数、数据集路径
    parser = argparse.ArgumentParser(
        description=__doc__)
    # 训练设备类型
    parser.add_argument('--device', default='cuda:0', help='device')
    # 训练数据集的根目录
    parser.add_argument('--data-path', default='./', help='dataset')
    # 文件保存地址
    parser.add_argument('--output-dir', default='./save_weights', help='path where to save')
    # 若需要接着上次训练,则指定上次训练保存权重文件地址
    parser.add_argument('--resume', default='', type=str, help='resume from checkpoint')
    # 指定接着从哪个epoch数开始训练
    parser.add_argument('--start_epoch', default=0, type=int, help='start epoch')
    # 训练的总epoch数
    parser.add_argument('--epochs', default=15, type=int, metavar='N',
                        help='number of total epochs to run')
    # 训练的batch size
    parser.add_argument('--batch_size', default=2, type=int, metavar='N',
                        help='batch size when training.')

    args = parser.parse_args()
    print(args)

    # 检查保存权重文件夹是否存在,不存在则创建
    if not os.path.exists(args.output_dir):
        os.makedirs(args.output_dir)

    main(args)

import os
import time
import json
import torch
import torchvision
from PIL import Image
import matplotlib.pyplot as plt
from torchvision import transforms
from network_files.faster_rcnn_framework import FasterRCNN
from backbone.resnet50_fpn_model import resnet50_fpn_backbone
from network_files.rpn_function import AnchorsGenerator
from draw_box_utils import draw_box

def create_model(num_classes):
    backbone = resnet50_fpn_backbone()
    model = FasterRCNN(backbone=backbone, num_classes=num_classes)
    return model

def main():
    # get devices
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))
    # create model
    model = create_model(num_classes=21)
    # load train weights
    train_weights = "./save_weights/model.pth"
    assert os.path.exists(train_weights), "{} file dose not exist.".format(train_weights)
    model.load_state_dict(torch.load(train_weights, map_location=device)["model"])#因为在训练的时候不光保存了模型的权重,还保存了优化器等,所以这里需要传入model字段
    model.to(device)
    label_json_path = './pascal_voc_classes.json'#读取pascalvoc的json文件
    assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path)
    json_file = open(label_json_path, 'r')
    class_dict = json.load(json_file)
    category_index = {
    
    v: k for k, v in class_dict.items()}
    # 载入一张图片
    original_img = Image.open("./test.jpg")
    # 对图片预处理,只需要转化成tensor就行了,因为在fasterrcnn中有预处理过程,这里就不需要其他的了。
    data_transform = transforms.Compose([transforms.ToTensor()])
    img = data_transform(original_img)
    # 增加一个维度
    img = torch.unsqueeze(img, dim=0)
    model.eval()  # 进入验证模式
    with torch.no_grad():
        img_height, img_width = img.shape[-2:]
        init_img = torch.zeros((1, 3, img_height, img_width), device=device)
        model(init_img)
        t_start = time.time()
        predictions = model(img.to(device))[0]
        print("inference+NMS time: {}".format(time.time() - t_start))
        predict_boxes = predictions["boxes"].to("cpu").numpy()
        predict_classes = predictions["labels"].to("cpu").numpy()
        predict_scores = predictions["scores"].to("cpu").numpy()
        if len(predict_boxes) == 0:
            print("没有检测到任何目标!")
        draw_box(original_img,#绘制图像,在draw_box_utils函数中
                 predict_boxes,
                 predict_classes,
                 predict_scores,
                 category_index,
                 thresh=0.5,
                 line_thickness=3)
        plt.imshow(original_img)
        plt.show()
        # 保存预测的图片结果
        original_img.save("test_result.jpg")
        
if __name__ == '__main__':
    main()
  • 自定义DataSet,读取自己数据集的数据

  • 1. 如何生成训练所需要的train.txt和val.txt文件(很基础,很重要,训练自己模型的第一步,语句都有备注):

split_data.py文件
import os
import random
def main():
    random.seed(0)  # 设置随机种子,保证随机结果可复现
    files_path = "所标注的xml文件根目录或者保存所有图片的根目录 "
    assert os.path.exists(files_path), "path: '{}' does not exist.".format(files_path)
    val_rate = 0.5#定义验证集的比例,这里是百分之五十
    # file.split(".")[0]通过该函数进行分割,前面是文件名,后面是后缀格式,这里只取文件名,sorted函数是排序
    files_name = sorted([file.split(".")[0] for file in os.listdir(files_path)])
    files_num = len(files_name)#拿到所有文件的数量
    val_index = random.sample(range(0, files_num), k=int(files_num*val_rate))#随机采样一部分数据,采样索引范围是第一个参数,采样个数就是参数二(要取整)。
    train_files = []
    val_files = []
    for index, file_name in enumerate(files_name):
        if index in val_index:#如果索引在val_index里,就将数据化分为验证集
            val_files.append(file_name)
        else:
            train_files.append(file_name)#否则划分到训练集中
    try:
        train_f = open("train.txt", "x")#最后建立两个txt文件
        eval_f = open("val.txt", "x")
        train_f.write("\n".join(train_files))#将数据列表拼接,中间用换行符隔开,写入到文件当中
        eval_f.write("\n".join(val_files))
    except FileExistsError as e:
        print(e)
        exit(1)

if __name__ == '__main__':
    main()
  • 2. 构建自己的DataSet方法参考的是PyTorch官方给的示例:继承Dataset类,实现__len__和__getitem__两个方法,前者可以获取数据集中所有数据的数量,后者可以返回图片及对应的信息。另外需要注意的一点是Dataset类中有一个get_height_and_width方法,这个方法在单GPU中不适用,会载入所有的图片并计算其高和宽,这样比较耗时同时也占内存,所以就提前自己去实现这样一个方法,这样就不会训练时就不会遍历整个数据集了。

  • 3. 源码实现(详细注释),如何构建自己的数据集,并拿到对应图像的xml数据:

my_dataset.py文件
from torch.utils.data import Dataset
import os
import torch
import json
from PIL import Image
from lxml import etree


class VOC2012DataSet(Dataset):
    """读取解析自己的数据集"""
    #oc_root训练集所在的根目录 transforms预处理方法
    def __init__(self, voc_root, transforms, txt_name: str = "train.txt"):
        self.root = os.path.join(voc_root, "VOCdevkit", "VOC2012")
        self.img_root = os.path.join(self.root, "JPEGImages")#图像的根目录
        self.annotations_root = os.path.join(self.root, "Annotations")#标注信息的根目录

        # read train.txt or val.txt file
        txt_path = os.path.join(self.root, "ImageSets", "Main", txt_name)
        assert os.path.exists(txt_path), "not found {} file.".format(txt_name)#相当于if语句

        with open(txt_path) as read:#打开文件并读取
            self.xml_list = [os.path.join(self.annotations_root, line.strip() + ".xml")#strip()可以去掉换行符
                             for line in read.readlines()]#通过for循环遍历每一行
        #此时xml_list就得到了每一个图片所对应的xml文件

        # read class_indict
        try:
            json_file = open('./pascal_voc_classes.json', 'r')#该json保存了类别名称及所对应索引
            self.class_dict = json.load(json_file)#载入后放入class_dict中
        except Exception as e:
            print(e)
            exit(-1)

        self.transforms = transforms

    def __len__(self):
        return len(self.xml_list)#返回所有文件的个数

    def __getitem__(self, idx):
        # read xml
        xml_path = self.xml_list[idx]#获取xml文件的路径
        with open(xml_path) as fid:
            xml_str = fid.read()#打开xml文件
        xml = etree.fromstring(xml_str)#通过该方法读取xml文件
        data = self.parse_xml_to_dict(xml)["annotation"]
        img_path = os.path.join(self.img_root, data["filename"])#当前图像路径
        image = Image.open(img_path)#打开图像文件
        if image.format != "JPEG":
            raise ValueError("Image format not JPEG")#不是jpg文件就会raise一个错误
        boxes = []#保存每一个object的bndbox(坐标信息)
        labels = []#保存标签,这里不是存标签,而是存标签对应的索引值。
        iscrowd = []#coco数据集中才有的,为0代表单目标,比较好检测,这里和xml中difficult栏结合在一起
        for obj in data["object"]:
            xmin = float(obj["bndbox"]["xmin"])#将字符型遍历转化为浮点型变量,因为在预测过程中返回的也是浮点型,因此和其保持一致
            xmax = float(obj["bndbox"]["xmax"])
            ymin = float(obj["bndbox"]["ymin"])
            ymax = float(obj["bndbox"]["ymax"])
            boxes.append([xmin, ymin, xmax, ymax])#将坐标信息放在一个list中,再将其添加到boxes中
            labels.append(self.class_dict[obj["name"]])#给json键值对传入当前目标的name来获取对应的索引值并将其添加到label中
            iscrowd.append(int(obj["difficult"]))#代表难检测还是易检测

        # 将上述列表转化为tensor的形式
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64)
        image_id = torch.tensor([idx])#当前数据所对应的索引值
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])#计算以下标注框面积

        target = {
    
    }#建立一个字典,将参数都添加到字典当中
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:#如果有传预处理操作下面就会执行,否则会跳过
            image, target = self.transforms(image, target)

        return image, target#返回图像以及对应的标注信息

    def get_height_and_width(self, idx):
        # read xml
        xml_path = self.xml_list[idx]#通过索引信息找到对应的xml文件
        with open(xml_path) as fid:
            xml_str = fid.read()#读取
        xml = etree.fromstring(xml_str)
        data = self.parse_xml_to_dict(xml)["annotation"]#解析size栏的信息
        data_height = int(data["size"]["height"])#将字符型转为数字型
        data_width = int(data["size"]["width"])
        return data_height, data_width

    def parse_xml_to_dict(self, xml):
        """
        将xml文件解析成字典形式,参考tensorflow的recursive_parse_xml_to_dict
        Args:
            xml: xml tree obtained by parsing XML file contents using lxml.etree
        Returns:
            Python dictionary holding XML contents.
       """
        if len(xml) == 0:  # 首先判断xml文件是否在底层,刚传入进来肯定是在annotation,通过len方法可以看annotation下面还有没有子目录
            return {
    
    xml.tag: xml.text}

        result = {
    
    }
        for child in xml:#遍历annotation下的所有子目录
            child_result = self.parse_xml_to_dict(child)  # 递归遍历下一层的信息
            if child.tag != 'object':#判断子目录是否是object
                result[child.tag] = child_result[child.tag]
            else:
                if child.tag not in result:  # 因为object可能有多个,所以需要放入列表里
                    result[child.tag] = []#以object为键创建一个空的列表(因为有很多的object,不能像其他的参数一样直接赋值)
                result[child.tag].append(child_result[child.tag])#将解析的object的信息添加到列表当中
        return {
    
    xml.tag: result}
transforms.py文件
import random
import torch
from torchvision.transforms import functional as F

class Compose(object):
    """组合多个transform函数"""
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target

class ToTensor(object):
    """将PIL图像转为Tensor"""
    def __call__(self, image, target):
        image = F.to_tensor(image)
        return image, target

class RandomHorizontalFlip(object):
    """随机水平翻转图像以及bboxes"""
    def __init__(self, prob=0.5):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:#如果小于0.5就进行翻转
            height, width = image.shape[-2:]
            image = image.flip(-1)  # 水平翻转图片
            bbox = target["boxes"]
            # bbox: xmin, ymin, xmax, ymax
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]  # 翻转对应bbox坐标信息 水平翻转只翻转x,y不变
            target["boxes"] = bbox#用新的bbox信息替换之前的bbox信息
        return image, target

在这里插入图片描述

  • 在对Faster R-CNN有一定认识的基础上,下面开始搭建网络模型:

  • 1. 给出网络流程图,按照该图搭建网络框架(注释很重要)

在这里插入图片描述

import warnings
from collections import OrderedDict
from typing import Tuple, List, Dict, Optional, Union

import torch
from torch import nn, Tensor
import torch.nn.functional as F
from torchvision.ops import MultiScaleRoIAlign

from network_files.roi_head import RoIHeads
from network_files.transform import GeneralizedRCNNTransform
from network_files.rpn_function import AnchorsGenerator, RPNHead, RegionProposalNetwork


class FasterRCNNBase(nn.Module):

    def __init__(self, backbone, rpn, roi_heads, transform):#参数:特征提取网络/区域建议生成网络部分/ROIpooling+两个预测层
        super(FasterRCNNBase, self).__init__()
        self.transform = transform
        self.backbone = backbone
        self.rpn = rpn
        self.roi_heads = roi_heads
        # used only on torchscript mode
        self._has_warned = False

    @torch.jit.unused
    def eager_outputs(self, losses, detections):
        # type: (Dict[str, Tensor], List[Dict[str, Tensor]]) -> Union[Dict[str, Tensor], List[Dict[str, Tensor]]]
        if self.training:
            return losses

        return detections

    def forward(self, images, targets=None):#参数:需要预测的图片/每一张图像所标注的信息
        #这里传入的图片大小是不一样的,后面会进行预处理将这些图片放入同样大小的tensor中打包成一个batch
        # type: (List[Tensor], Optional[List[Dict[str, Tensor]]]) -> Tuple[Dict[str, Tensor], List[Dict[str, Tensor]]]
        #上面的语句不是注释:images是list类型,每个元素都是Tensor;targets也是list类型,内部每个元素是字典类型,包含着每个图片的标注信息

        #以下的if语句都是对数据进行检查,判断数据是否有错误
        if self.training and targets is None:#如果是训练模型,就必须要有targets
            raise ValueError("In training mode, targets should be passed")

        if self.training:#如果是训练模式
            assert targets is not None
            for target in targets:         # 进一步判断传入的target的boxes参数是否符合规定
                boxes = target["boxes"]
                if isinstance(boxes, torch.Tensor):#判断boxes是不是torch.Tensor格式
                    if len(boxes.shape) != 2 or boxes.shape[-1] != 4:#判断shape是不是不等于2
                        raise ValueError("Expected target boxes to be a tensor"
                                         "of shape [N, 4], got {:}.".format(
                                          boxes.shape))
                else:
                    raise ValueError("Expected target boxes to be of type "
                                     "Tensor, got {:}.".format(type(boxes)))

        original_image_sizes = torch.jit.annotate(List[Tuple[int, int]], [])#List[Tuple[int, int]]是对变量original_image_sizes的声明,该变量用来存储每张图像原始尺寸
        for img in images:#遍历图像
            val = img.shape[-2:]#对每张图像取最后两个元素 因为dataset.py中返回的img已经被转化为tensor格式,维度的排列顺序是[channel,height,width]
            assert len(val) == 2  # 防止输入的是个一维向量
            original_image_sizes.append((val[0], val[1]))#将所得图像的高度和宽度添加到刚才的变量中(用以记录原始图像的size)
        images, targets = self.transform(images, targets)  # 对图像进行预处理,对应框图中的第二个框 得到新的images和targets(这里才是真正的一个batch  )
        features = self.backbone(images.tensors)  # 将图像输入backbone得到特征图
        if isinstance(features, torch.Tensor):  # 若只在一层特征层上预测,将feature放入有序字典中,并编号为‘0’  此语句是为了将单层预测和多层预测统一起来
            features = OrderedDict([('0', features)])  # 若在多层特征层上预测,传入的就是一个有序字典  resnet的特征图有五个,mobilenet的特征图只有一个

        # 将特征层以及标注target信息传入rpn中
        # proposals: List[Tensor], Tensor_shape: [num_proposals, 4],
        # 每个proposals是绝对坐标,且为(x1, y1, x2, y2)格式
        proposals, proposal_losses = self.rpn(images, features, targets)#通过rpn可以得到区域建议框和rpn的损失

        # 将rpn生成的数据以及标注target信息传入fast rcnn后半部分
        detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)

        # 对网络的预测结果进行后处理(主要将bboxes映射到原图像尺度上)
        detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)#images.image_sizes是图像预处理之后的坐标信息,而不是原始图像的坐标信息

        losses = {
    
    }
        losses.update(detector_losses)
        losses.update(proposal_losses)

        if torch.jit.is_scripting():#将动态图变为静态图,编译过程中可以起到加速处理作用
            if not self._has_warned:
                warnings.warn("RCNN always returns a (Losses, Detections) tuple in scripting")
                self._has_warned = True
            return losses, detections
        else:
            return self.eager_outputs(losses, detections)

class TwoMLPHead(nn.Module):

    def __init__(self, in_channels, representation_size):
        super(TwoMLPHead, self).__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x


class FastRCNNPredictor(nn.Module):

    def __init__(self, in_channels, num_classes):
        super(FastRCNNPredictor, self).__init__()
        self.cls_score = nn.Linear(in_channels, num_classes)
        self.bbox_pred = nn.Linear(in_channels, num_classes * 4)

    def forward(self, x):
        if x.dim() == 4:
            assert list(x.shape[2:]) == [1, 1]
        x = x.flatten(start_dim=1)
        scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        return scores, bbox_deltas


class FasterRCNN(FasterRCNNBase):

    def __init__(self, backbone, num_classes=None,#需要检测的目标类别个数,这个个数是需要加上背景的
                 # transform 参数
                 min_size=800, max_size=1000,      # 预处理resize时限制的最小尺寸与最大尺寸 也就是说将输入的图像都统一缩放到这个范围之内
                 image_mean=None, image_std=None,  # 预处理normalize时使用的均值和方差
                 # RPN 网络参数
                 rpn_anchor_generator=None, rpn_head=None,#rpn_anchor_generator是用于生成ancher的生成器,rpn_head:包括3*3的卷积核,分层及边界框回归层    
                 rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,    # rpn中在nms处理前保留的proposal数(根据score)
                 rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,  # rpn中在nms处理后保留的proposal数
                 rpn_nms_thresh=0.7,  # rpn中进行nms处理时使用的iou阈值
                 rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,  # rpn计算损失时,采集正负样本设置的阈值  anchor和GT的iou大于0.7为正样本, 小于0.3为负样本
                 rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,  # rpn计算损失时采样的样本数,以及正样本占总样本的比例
                 # Box parameters
                 box_roi_pool=None, box_head=None, box_predictor=None,#分别为ROIpooling层/Two MLPHead/FastRCNNPredictor
                 # 移除低目标概率      fast rcnn中进行nms处理的阈值   对预测结果根据score排序取前100个目标(一般也要不了一百个 足够了)
                 box_score_thresh=0.05, box_nms_thresh=0.5, box_detections_per_img=100,#box_score_thresh是滤除小概率目标的阈值  box_nms_thresh是nms时采用的阈值
                 box_fg_iou_thresh=0.5, box_bg_iou_thresh=0.5,   # fast rcnn计算误差时,采集正负样本设置的阈值
                 box_batch_size_per_image=512, box_positive_fraction=0.25,  # fast rcnn计算误差时采样的样本数(512个),以及正样本占所有样本的比例(1/4)
                 bbox_reg_weights=None):
        if not hasattr(backbone, "out_channels"):#输出特征矩阵的深度  如果没有会报错
            raise ValueError(
                "backbone should contain an attribute out_channels"
                "specifying the number of output channels  (assumed to be the"
                "same for all the levels"
            )

        assert isinstance(rpn_anchor_generator, (AnchorsGenerator, type(None)))
        assert isinstance(box_roi_pool, (MultiScaleRoIAlign, type(None)))

        if num_classes is not None:#如果自定义了num_classes 则box_predictor就需要我们自己去定义
            if box_predictor is not None:
                raise ValueError("num_classes should be None when box_predictor "
                                 "is specified")
        else:
            if box_predictor is None:
                raise ValueError("num_classes should not be None when box_predictor "
                                 "is not specified")

        # 预测特征层的channels
        out_channels = backbone.out_channels

        # 若anchor生成器为空,则自动生成针对resnet50_fpn的anchor生成器
        if rpn_anchor_generator is None:
            anchor_sizes = ((32,), (64,), (128,), (256,), (512,))#resnet50+fpn有五个特征层,根据不同的尺度预测不同的大小,尺寸最大的特征层(也就是拥有最高细粒度的特征层)就预测最小的目标,特征图比较小的预测较大的目标
            aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)#×5就是将元组重复五遍其中每一个元素就对应一个特征层上的一个尺度
            rpn_anchor_generator = AnchorsGenerator(
                anchor_sizes, aspect_ratios
            )

        # 生成RPN通过滑动窗口预测网络部分
        if rpn_head is None:#一般是不会传参的
            rpn_head = RPNHead(#一个3×3的卷积层和两个1×1的卷积层
                out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
            )

        # 默认rpn_pre_nms_top_n_train = 2000, rpn_pre_nms_top_n_test = 1000,
        # 默认rpn_post_nms_top_n_train = 2000, rpn_post_nms_top_n_test = 1000,
        rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)#将二者放到一个字典当中
        rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)

        # 定义整个RPN框架
        rpn = RegionProposalNetwork(
            rpn_anchor_generator, rpn_head,
            rpn_fg_iou_thresh, rpn_bg_iou_thresh,
            rpn_batch_size_per_image, rpn_positive_fraction,
            rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)

        #  Multi-scale RoIAlign pooling
        if box_roi_pool is None:
            box_roi_pool = MultiScaleRoIAlign(
                featmap_names=['0', '1', '2', '3'],  # 在哪些特征层进行roi pooling
                output_size=[7, 7],
                sampling_ratio=2)

        # fast RCNN中roi pooling后的展平处理两个全连接层部分
        if box_head is None:
            resolution = box_roi_pool.output_size[0]  # 默认等于7
            representation_size = 1024
            box_head = TwoMLPHead(
                out_channels * resolution ** 2,
                representation_size
            )

        # 在box_head的输出上预测部分
        if box_predictor is None:
            representation_size = 1024
            box_predictor = FastRCNNPredictor(
                representation_size,
                num_classes)

        # 将roi pooling, box_head以及box_predictor结合在一起
        roi_heads = RoIHeads(
            # box
            box_roi_pool, box_head, box_predictor,
            box_fg_iou_thresh, box_bg_iou_thresh,  # 0.5  0.5
            box_batch_size_per_image, box_positive_fraction,  # 512  0.25
            bbox_reg_weights,
            box_score_thresh, box_nms_thresh, box_detections_per_img)  # 0.05  0.5  100

        if image_mean is None:#预处理的图像均值和方差,这里使用的时imagenet上的均值方差
            image_mean = [0.485, 0.456, 0.406]
        if image_std is None:
            image_std = [0.229, 0.224, 0.225]

        # 对数据进行标准化,缩放,打包成batch等处理部分
        transform = GeneralizedRCNNTransform(min_size, max_size, image_mean, image_std)

        super(FasterRCNN, self).__init__(backbone, rpn, roi_heads, transform)
  • 2. 年后续更

猜你喜欢

转载自blog.csdn.net/qq_42308217/article/details/113772717