13 Computer Vision-Detailed Code Explanation

13.2 Fine-tuning

In order to prevent overfitting on the training set, there are two methods. The first is to expand the number of training sets, but this requires a lot of cost; the second is to apply transfer learning to transfer the knowledge learned from the source data to the target data set. , that is, directly copying the parameters and models (excluding the output layer) trained on the source data to the target data set for training.

# IPython魔法函数,可以不用执行plt .show()
%matplotlib inline
import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l

13.2.1 Obtaining data sets

#@save
d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip',
                         'fba480ffa8aa7e0febbb511d181409f899b9baa5')

data_dir = d2l.download_extract('hotdog')
train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train'))
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test'))
hotdogs = [train_imgs[i][0] for i in range(8)]
not_hotdogs = [train_imgs[-i-1][0] for i in range(8)]
# 展示2行8列矩阵的图片,共16张
d2l.show_images(hotdogs+not_hotdogs,2,8,scale=1.5)
# 使用RGB通道的均值和标准差,以标准化每个通道
normalize = torchvision.transforms.Normalize(
    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
# 图像增广
train_augs = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(224),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    normalize])
test_augs = torchvision.transforms.Compose([
    torchvision.transforms.Resize([256, 256]),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    normalize])

 13.2.2 Initialize the model

# 自动下载网上的训练模型
finetune_net = torchvision.models.resnet18(pretrained=True)
# 输入张量的形状还是源输入张量大小,输入张量大小改为2
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2)
nn.init.xavier_uniform_(finetune_net.fc.weight);

13.2.3 Fine-tuning the model

# 如果param_group=True,输出层中的模型参数将使用十倍的学习率
# 如果param_group=False,输出层中模型参数为随机值
# 训练模型
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5,
                      param_group=True):
    train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'train'), transform=train_augs),
        batch_size=batch_size, shuffle=True)
    test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'test'), transform=test_augs),
        batch_size=batch_size)
    devices = d2l.try_all_gpus()
    loss = nn.CrossEntropyLoss(reduction="none")
    if param_group:
        params_1x = [param for name, param in net.named_parameters()
             if name not in ["fc.weight", "fc.bias"]]
        # params_1x的参数使用learning_rate学习率, net.fc.parameters()的参数使用0.001的学习率
        trainer = torch.optim.SGD([{'params': params_1x},
                                   {'params': net.fc.parameters(),
                                    'lr': learning_rate * 10}],
                                lr=learning_rate, weight_decay=0.001)
    else:
        trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
                                  weight_decay=0.001)
    d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
                   devices)
train_fine_tuning(finetune_net, 5e-5)

13.3 Object detection and bounding boxes

Sometimes it is not only necessary to identify the category of the image, but also the location of the image. In computer vision, it is called target recognition or target detection. This section introduces the deep learning method for target detection.

%matplotlib inline
import torch
from d2l import torch as d2l
#@save
def box_corner_to_center(boxes):
    """从(左上,右下)转换到(中间,宽度,高度)"""
    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    # cx,xy,w,h的维度是n
    cx = (x1 + x2) / 2
    cy = (y1 + y2) / 2
    w = x2 - x1
    h = y2 - y1
    # torch.stack()沿着新维度对张量进行链接。boxes最开始维度是(n,4),axis=-1表示倒数第一个维度
    # torch.stack()将(cx, cy, w, h)的维度n将其沿着倒数第一个维度拼接在一起,又是(n,4)
    boxes = torch.stack((cx, cy, w, h), axis=-1)
    return boxes

#@save
def box_center_to_corner(boxes):
    """从(中间,宽度,高度)转换到(左上,右下)"""
    cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    x1 = cx - 0.5 * w
    y1 = cy - 0.5 * h
    x2 = cx + 0.5 * w
    y2 = cy + 0.5 * h
    boxes = torch.stack((x1, y1, x2, y2), axis=-1)
    return boxes

13.4 Anchor box

Target detection algorithms usually collect a large number of samples in images. This section introduces one of the sampling methods: taking a certain pixel as the center to generate multiple bounding boxes with different scaling ratios and aspect ratios.

13.4.1 Generate multiple anchor boxes

%matplotlib inline
import torch
from d2l import torch as d2l

torch.set_printoptions(2)  # 精简输出精度,显示小数点后2位
"""
形成多个锚框
params:
    data:图像(批量大小,通道数,高,宽)
    sizes:缩放比尺寸集合
    ratios:宽高比集合
    
"""
def multibox_prior(data, sizes, ratios):
    # 获取data后两位的值,也就是图像的高和宽
    in_height, in_width = data.shape[-2:]
    """ 
    params:
        device:cpu或者gpu
        num_sizes:尺寸的个数n
        num_ratios:宽高比个数m
    """
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    # 以同一像素为中心的锚框数量n+m-1
    boxes_per_pixel = (num_sizes + num_ratios - 1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)

    # offset:为了将锚点移动到像素的中心,需要设置偏移量。
    # steps:归一化,将宽高规化到0-1之间,因为一个像素的高为1且宽为1,我们选择偏移我们的中心0.5
    offset_h, offset_w = 0.5, 0.5
    steps_h = 1.0 / in_height  # 在y轴上缩放步长
    steps_w = 1.0 / in_width  # 在x轴上缩放步长
    
    # 假设宽高512*216 那么torch.arange(in_height, device=device)=【0,1,2...511】,移动到中心就是[0.5,1.5...511.5]
    # 第一步:torch.arange(in_height, device=device) + offset_h代表移动到每个像素的中心,因为每个像素1*1大小.
    # 第二步:宽高进行归一化
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    """
    a = torch.tensor([1, 2, 3, 4])
    b = torch.tensor([4, 5, 6])

    x, y = torch.meshgrid(a, b,indexing='ij')
    print:tensor([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 3],
        [4, 4, 4]])
    tensor([[4, 5, 6],
        [4, 5, 6],
        [4, 5, 6],
        [4, 5, 6]])
    x, y = torch.meshgrid(a, b,indexing='xy')
    print:tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])
    tensor([[4, 4, 4, 4],
        [5, 5, 5, 5],
        [6, 6, 6, 6]])
    """
    # 对比上面例子,假设center_h=tensor([0.5,1.5...511.5])(实际上是0-1的值,这里为了简单理解写成这样)  
    # 则shift_y=tensor([0.5,0.5..],[1.5,1.5,...],...[511.5,511.5...])
    shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')   
    # 将shift展平成一维序列,用上述的例子则shift_y为tensor([0.5,0.5...511.5,511.5])
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
    # 宽=h*s*sqrt(r)
    # 由于锚框只考虑s1和r1的组合,r1组合就是size_tensor * torch.sqrt(ratio_tensor[0]),s1组合就是sizes[0] * torch.sqrt(ratio_tensor[1:])
    # 此处要乘上in_height / in_width是因为,假设此时ratios宽高比为1,那么默认w=h,但是实际上ratios代表与原图宽高比一致,举个例子
    # 假设原图1000*10,那么当ratios为1时,此时w=h,而我们需要的是w/h = 1000/10,所以需要乘上in_height / in_width来与原尺寸保持一致
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
                   sizes[0] * torch.sqrt(ratio_tensor[1:])))\
                   * in_height / in_width  # 处理矩形输入
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
                   sizes[0] / torch.sqrt(ratio_tensor[1:])))
    # 除以2来获得半高和半宽
    # 每一行(-w, -h, w, h)对应一个锚框一个锚框的左上角偏差和右下角偏差
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2
    
    # 每个中心点都将有boxes_per_pixel=(n+m-1)个锚框,
    # 形状:(w*h*(n+m-1), 4)
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)
    output = out_grid + anchor_manipulations
    # 添加一个维度
    return output.unsqueeze(0)
    
img = d2l.plt.imread('../data/img/catdog.jpg')
h, w = img.shape[:2] # (1080, 1920)
X = torch.rand(size=(1, 3, h, w))
Y = multibox_prior(X, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5])
print(Y.shape)
# 即将Y变成(高,宽,以同一像素点为中心的锚框数,4)
# 每个锚框有四个元素(锚框的左上角x,y坐标和锚框右下角的x,y坐标)
# n+m-1=3+3-1=5
boxes = Y.reshape(h, w, 5, 4)
# 访问以(250,250)为中心的第一个锚框
boxes[250, 250, 0, :]
# 显示以某个像素点为中心的所有锚框
"""
params:
    axes:图像坐标
    bboxes:某个像素点中心坐标
    labels:显示文本,例如s=0.2,r=1
    colors:锚框的颜色
"""
def show_bboxes(axes, bboxes, labels=None, colors=None):
    """显示所有边界框"""
    def _make_list(obj, default_values=None):
        if obj is None:
            obj = default_values
        elif not isinstance(obj, (list, tuple)):
            obj = [obj]
        return obj

    labels = _make_list(labels)
    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])
    for i, bbox in enumerate(bboxes):
        color = colors[i % len(colors)]
        # bbox_to_rect将边界框(左上x,左上y,右下x,右下y)格式转换成matplotlib格式:
        # ((左上x,左上y),宽,高)
        rect = d2l.bbox_to_rect(bbox.detach().numpy(), color)
        axes.add_patch(rect)
        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))
d2l.set_figsize()
bbox_scale = torch.tensor((w, h, w, h))
fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, boxes[750, 750, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])

13.4.2 Intersection and Union Ratio

# 衡量锚框与真实框之间或者锚框与锚框之间的相似度,即A∩B/A∪B
def box_iou(boxes1, boxes2):
    """计算两个锚框或边界框列表中成对的交并比"""
    box_area = lambda boxes: ((boxes[:, 2] - boxes[:, 0]) *
                              (boxes[:, 3] - boxes[:, 1]))
     # boxes1,boxes2,areas1,areas2的形状:
    # boxes1:(boxes1的数量,4),
    # boxes2:(boxes2的数量,4),
    # areas1:(boxes1的数量,),
    # areas2:(boxes2的数量,)
    areas1 = box_area(boxes1)
    areas2 = box_area(boxes2)
    # inter_upperlefts,inter_lowerrights,inters的形状:
    # (boxes1的数量,boxes2的数量,2)
    inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:, :2])
    inter_lowerrights = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])
    inters = (inter_lowerrights - inter_upperlefts).clamp(min=0)
    # inter_areasandunion_areas的形状:(boxes1的数量,boxes2的数量)
    inter_areas = inters[:, :, 0] * inters[:, :, 1]
    union_areas = areas1[:, None] + areas2 - inter_areas
    return inter_areas / union_areas

13.4.3 Annotating anchor boxes in training data

%matplotlib inline
import torch
from d2l import torch as d2l

torch.set_printoptions(2)  # 精简输出精度,显示小数点后2位
"""
形成多个锚框
params:
    data:图像(批量大小,通道数,高,宽)
    sizes:缩放比尺寸集合
    ratios:宽高比集合
    
"""
def multibox_prior(data, sizes, ratios):
    # 获取data后两位的值,也就是图像的高和宽
    in_height, in_width = data.shape[-2:]
    """ 
    params:
        device:cpu或者gpu
        num_sizes:尺寸的个数n
        num_ratios:宽高比个数m
    """
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    # 以同一像素为中心的锚框数量n+m-1
    boxes_per_pixel = (num_sizes + num_ratios - 1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)

    # offset:为了将锚点移动到像素的中心,需要设置偏移量。
    # steps:归一化,将宽高规化到0-1之间,因为一个像素的高为1且宽为1,我们选择偏移我们的中心0.5
    offset_h, offset_w = 0.5, 0.5
    steps_h = 1.0 / in_height  # 在y轴上缩放步长
    steps_w = 1.0 / in_width  # 在x轴上缩放步长
    
    # 假设宽高512*216 那么torch.arange(in_height, device=device)=【0,1,2...511】,移动到中心就是[0.5,1.5...511.5]
    # 第一步:torch.arange(in_height, device=device) + offset_h代表移动到每个像素的中心,因为每个像素1*1大小.
    # 第二步:宽高进行归一化
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    """
    a = torch.tensor([1, 2, 3, 4])
    b = torch.tensor([4, 5, 6])

    x, y = torch.meshgrid(a, b,indexing='ij')
    print:tensor([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 3],
        [4, 4, 4]])
    tensor([[4, 5, 6],
        [4, 5, 6],
        [4, 5, 6],
        [4, 5, 6]])
    x, y = torch.meshgrid(a, b,indexing='xy')
    print:tensor([[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]])
    tensor([[4, 4, 4, 4],
        [5, 5, 5, 5],
        [6, 6, 6, 6]])
    """
    # 对比上面例子,假设center_h=tensor([0.5,1.5...511.5])(实际上是0-1的值,这里为了简单理解写成这样)  
    # 则shift_y=tensor([0.5,0.5..],[1.5,1.5,...],...[511.5,511.5...])
    shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')   
    # 将shift展平成一维序列,用上述的例子则shift_y为tensor([0.5,0.5...511.5,511.5])
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
    # 宽=h*s*sqrt(r)
    # 由于锚框只考虑s1和r1的组合,r1组合就是size_tensor * torch.sqrt(ratio_tensor[0]),s1组合就是sizes[0] * torch.sqrt(ratio_tensor[1:])
    # 此处要乘上in_height / in_width是因为,假设此时ratios宽高比为1,那么默认w=h,但是实际上ratios代表与原图宽高比一致,举个例子
    # 假设原图1000*10,那么当ratios为1时,此时w=h,而我们需要的是w/h = 1000/10,所以需要乘上in_height / in_width来与原尺寸保持一致
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
                   sizes[0] * torch.sqrt(ratio_tensor[1:])))\
                   * in_height / in_width  # 处理矩形输入
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
                   sizes[0] / torch.sqrt(ratio_tensor[1:])))
    # 除以2来获得半高和半宽
    # 每一行(-w, -h, w, h)对应一个锚框一个锚框的左上角偏差和右下角偏差
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2
    
    # 每个中心点都将有boxes_per_pixel=(n+m-1)个锚框,
    # 形状:(w*h*(n+m-1), 4)
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)
    output = out_grid + anchor_manipulations
    # 添加一个维度
    return output.unsqueeze(0)
    
img = d2l.plt.imread('../data/img/catdog.jpg')
h, w = img.shape[:2] # (1080, 1920)
X = torch.rand(size=(1, 3, h, w))
Y = multibox_prior(X, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5])
print(Y.shape)
# 即将Y变成(高,宽,以同一像素点为中心的锚框数,4)
# 每个锚框有四个元素(锚框的左上角x,y坐标和锚框右下角的x,y坐标)
# n+m-1=3+3-1=5
boxes = Y.reshape(h, w, 5, 4)
# 访问以(250,250)为中心的第一个锚框
boxes[250, 250, 0, :]
# 显示以某个像素点为中心的所有锚框
"""
params:
    axes:图像坐标
    bboxes:某个像素点中心坐标
    labels:显示文本,例如s=0.2,r=1
    colors:锚框的颜色
"""
def show_bboxes(axes, bboxes, labels=None, colors=None):
    """显示所有边界框"""
    def _make_list(obj, default_values=None):
        if obj is None:
            obj = default_values
        elif not isinstance(obj, (list, tuple)):
            obj = [obj]
        return obj

    labels = _make_list(labels)
    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])
    for i, bbox in enumerate(bboxes):
        color = colors[i % len(colors)]
        # bbox_to_rect将边界框(左上x,左上y,右下x,右下y)格式转换成matplotlib格式:
        # ((左上x,左上y),宽,高)
        rect = d2l.bbox_to_rect(bbox.detach().numpy(), color)
        axes.add_patch(rect)
        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))
d2l.set_figsize()
bbox_scale = torch.tensor((w, h, w, h))
fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, boxes[750, 750, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])
# 衡量锚框与真实框之间或者锚框与锚框之间的相似度,即A∩B/A∪B
def box_iou(boxes1, boxes2):
    """计算两个锚框或边界框列表中成对的交并比"""
    box_area = lambda boxes: ((boxes[:, 2] - boxes[:, 0]) *
                              (boxes[:, 3] - boxes[:, 1]))
     # boxes1,boxes2,areas1,areas2的形状:
    # boxes1:(boxes1的数量,4),
    # boxes2:(boxes2的数量,4),
    # areas1:(boxes1的数量,),
    # areas2:(boxes2的数量,)
    areas1 = box_area(boxes1)
    areas2 = box_area(boxes2)
    # inter_upperlefts,inter_lowerrights,inters的形状:
    # (boxes1的数量,boxes2的数量,2)
    inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:, :2])
    inter_lowerrights = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])
    inters = (inter_lowerrights - inter_upperlefts).clamp(min=0)
    # inter_areasandunion_areas的形状:(boxes1的数量,boxes2的数量)
    inter_areas = inters[:, :, 0] * inters[:, :, 1]
    union_areas = areas1[:, None] + areas2 - inter_areas
    return inter_areas / union_areas
# 将最接近的真实边界框分配给锚框
# iou_threshold:阈值
def assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5):
    # num_anchors=na num_gt_boxes=nb
    num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]
    # 位于第i行和第j列的元素x_ij是锚框i和真实边界框j的IoU
    jaccard = box_iou(anchors, ground_truth)
    # 对于每个锚框,分配的真实边界框的张量,初始值为-1
    anchors_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long,
                                  device=device)
    # 找到每一行中最大交并比的ground_truth和anchors索引号
    max_ious, indices = torch.max(jaccard, dim=1)
    # 找到剩余交并比大于阈值的索引号
    anc_i = torch.nonzero(max_ious >= iou_threshold).reshape(-1)
    box_j = indices[max_ious >= iou_threshold]
    
    anchors_bbox_map[anc_i] = box_j
    # 删去这些索引行和列
    col_discard = torch.full((num_anchors,), -1)
    row_discard = torch.full((num_gt_boxes,), -1)
    for _ in range(num_gt_boxes):
        max_idx = torch.argmax(jaccard)
        box_idx = (max_idx % num_gt_boxes).long()
        anc_idx = (max_idx / num_gt_boxes).long()
        anchors_bbox_map[anc_idx] = box_idx
        jaccard[:, box_idx] = col_discard
        jaccard[anc_idx, :] = row_discard
    return anchors_bbox_map
#@save
def offset_boxes(anchors, assigned_bb, eps=1e-6):
    """对锚框偏移量的转换"""
    c_anc = d2l.box_corner_to_center(anchors)
    c_assigned_bb = d2l.box_corner_to_center(assigned_bb)
    offset_xy = 10 * (c_assigned_bb[:, :2] - c_anc[:, :2]) / c_anc[:, 2:]
    offset_wh = 5 * torch.log(eps + c_assigned_bb[:, 2:] / c_anc[:, 2:])
    offset = torch.cat([offset_xy, offset_wh], axis=1)
    return offset
#@save
def multibox_target(anchors, labels):
    """使用真实边界框标记锚框"""
    batch_size, anchors = labels.shape[0], anchors.squeeze(0)
    batch_offset, batch_mask, batch_class_labels = [], [], []
    device, num_anchors = anchors.device, anchors.shape[0]
    for i in range(batch_size):
        label = labels[i, :, :]
        anchors_bbox_map = assign_anchor_to_bbox(
            label[:, 1:], anchors, device)
        bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(
            1, 4)
        # 将类标签和分配的边界框坐标初始化为零
        class_labels = torch.zeros(num_anchors, dtype=torch.long,
                                   device=device)
        assigned_bb = torch.zeros((num_anchors, 4), dtype=torch.float32,
                                  device=device)
        # 使用真实边界框来标记锚框的类别。
        # 如果一个锚框没有被分配,标记其为背景(值为零)
        indices_true = torch.nonzero(anchors_bbox_map >= 0)
        bb_idx = anchors_bbox_map[indices_true]
        class_labels[indices_true] = label[bb_idx, 0].long() + 1
        assigned_bb[indices_true] = label[bb_idx, 1:]
        # 偏移量转换
        offset = offset_boxes(anchors, assigned_bb) * bbox_mask
        batch_offset.append(offset.reshape(-1))
        batch_mask.append(bbox_mask.reshape(-1))
        batch_class_labels.append(class_labels)
    bbox_offset = torch.stack(batch_offset)
    bbox_mask = torch.stack(batch_mask)
    class_labels = torch.stack(batch_class_labels)
    return (bbox_offset, bbox_mask, class_labels)

ground_truth = torch.tensor([[0, 0.1, 0.08, 0.52, 0.92],
                         [1, 0.55, 0.2, 0.9, 0.88]])
anchors = torch.tensor([[0, 0.1, 0.2, 0.3], [0.15, 0.2, 0.4, 0.4],
                    [0.63, 0.05, 0.88, 0.98], [0.66, 0.45, 0.8, 0.8],
                    [0.57, 0.3, 0.92, 0.9]])

13.5 Multi-scale target detection

First, the anchor box is to set up n+m-1 anchor boxes at each pixel of the image. If the image size is large, it will cause data explosion. This section describes uniformly sampling a small portion of pixels in the input image.

The output of the convolutional layer, that is, the feature map, is used to establish the anchor box. Each pixel of the feature map is set as the center of the anchor box. The pixels in the feature map (w, h) of any image are sampled. These uniformly sampled pixels are as a center.

def display_anchors(fmap_w, fmap_h, s):
    d2l.set_figsize()
    # 前两个维度上的值不影响输出
    fmap = torch.zeros((10, 100, fmap_h, fmap_w))
    anchors = d2l.multibox_prior(fmap, sizes=s, ratios=[1, 2, 0.5])
    bbox_scale = torch.tensor((w, h, w, h))
    d2l.show_bboxes(d2l.plt.imshow(img).axes,
                    anchors[0] * bbox_scale)

Guess you like

Origin blog.csdn.net/DW_css/article/details/132278372