OpenPCDet Series | 5.2 PointPillars Algorithm——PointPillarScatter Pseudo-Image BEV Feature Building Block

The entire structure diagram of OpenPCDet:
insert image description here

PointPillarScatter module

After PillarVFE encoding, the batch_dict update at this time is as follows, the pillar_features field is added to indicate the features of each voxel, and then the Map_to_BEV module is processed.

insert image description here

In the Map_to_BEV structure, under the corresponding __init__ file, you can also find all optional modules, and select the PointPillarScatter module used in the PointPillars algorithm.

# 根据MODEL中的MAP_TO_BEV确定选择的模块
__all__ = {
    
    
    'HeightCompression': HeightCompression,
    'PointPillarScatter': PointPillarScatter,
    'Conv2DCollapse': Conv2DCollapse
}

1. PointPillarScatter initialization

The initialization part of the PointPillarScatter module is relatively simple. It simply saves the relevant yaml configuration and the grid size of the plane network of the point cloud scene. Since PointPillar divides the entire point cloud scene into planar grids, the z dimension here must be 1, that is, the network is not segmented on the z axis.

self.model_cfg = model_cfg
self.num_bev_features = self.model_cfg.NUM_BEV_FEATURES
self.nx, self.ny, self.nz = grid_size   # PointPillar将整个点云场景切分为平面网格,所以这里的z维度一定是1
assert self.nz == 1

2. PointPillarScatter forward propagation

It can be found from batch_dict that now points and coords rely on each dimension to distinguish point cloud frame scenes.
insert image description here

What needs to be done in the map2bev module is to restore the extracted voxel features to the original space to form a pseudo image feature.

The specific method is, for example, the currently extracted voxel feature is Nx64. The first choice is to construct a 0 matrix for the entire pseudo-space. Here, the grid size of the two points is [1, 432, 496], so an empty matrix of [64, 1x432x496] will be constructed. Sequentially obtain a point cloud frame scene with corresponding dimensions, and construct a batch mask. According to this mask, the pillar feature and position coord feature of a certain point cloud frame can be obtained. Then, since coord stores the specific grid of voxel, an index in one-dimensional space can be constructed for it, and then the voxel features can be filled in the corresponding pairs of positions of the empty matrix according to the index. Then reshape back to the original space to construct a feature matrix of a pseudo image.

The complete operation process is as follows:

def forward(self, batch_dict, **kwargs):
    """
    Args:
        pillar_features:(31530,64)
        voxels:(31530,32,4) --> (x,y,z,intensity)
        voxel_coords:(31530,4) --> (batch_index,z,y,x) 在dataset.collate_batch中增加了batch索引
        voxel_num_points:(31530,)
    Returns:
        batch_spatial_features:(4, 64, 496, 432)
    """
    pillar_features, coords = batch_dict['pillar_features'], batch_dict['voxel_coords']     # (102483, 64) / (102483, 4)
    batch_spatial_features = []
    batch_size = coords[:, 0].max().int().item() + 1    # 16

    # 依次对每个点云帧场景进行处理
    for batch_idx in range(batch_size):
        spatial_feature = torch.zeros(      # 构建[64, 1x432x496]的0矩阵
            self.num_bev_features,   # 64
            self.nz * self.nx * self.ny,    # 1x432x496
            dtype=pillar_features.dtype,
            device=pillar_features.device)

        batch_mask = coords[:, 0] == batch_idx  # 构建batch掩码
        this_coords = coords[batch_mask, :]     # 用来挑选出当前真个batch数据中第batch_idx的点云帧场景
        # this_coords: [7857, 4]  4个维度的含义分别为:(batch_index,z,y,x) 由于pointpillars只有一层,所以计算索引的方法如下所示
        indices = this_coords[:, 1] + this_coords[:, 2] * self.nx + this_coords[:, 3]     # 网格的一维展开索引
        indices = indices.type(torch.long)
        pillars = pillar_features[batch_mask, :]    # 根据mask提取pillar_features [7857, 64]
        pillars = pillars.t()   # 矩阵转置 [64, 7857]
        spatial_feature[:, indices] = pillars       # 在索引位置填充pillars
        batch_spatial_features.append(spatial_feature)      # 将空间特征加入list,每个元素为(64,214272)

    batch_spatial_features = torch.stack(batch_spatial_features, 0)     # 堆叠拼接 (16, 64, 214272)
    # reshape回原空间(伪图像)--> (16, 64, 496, 432), 再保存结果
    batch_spatial_features = batch_spatial_features.view(batch_size, self.num_bev_features * self.nz, self.ny, self.nx) # (16, 64, 496, 432)
    batch_dict['spatial_features'] = batch_spatial_features
    return batch_dict

The final constructed spatial_features dimension is (16, 64, 496, 432). 16 here is batch_size, indicating that there are 16 point cloud frame scene information. 496x432 represents the grid size of the fake image, and 64 represents the feature dimension on each grid. The features on the voxel will be filled by pillars_featuer, and the position features on other non-voxel indexes are 0, so the entire feature matrix is ​​still relatively sparse.

Finally, after processing by the PointPillarScatter module, the one-dimensionally stored voxel features can be converted into one-dimensional bev perspective features, although they are relatively sparse. The update of batch_dict data is as follows:

insert image description here


Guess you like

Origin blog.csdn.net/weixin_44751294/article/details/130563175