Interpretation of CenterPoint source code process (1)

References :
1. Quick reading of the paper – CenterPoint
2. Introduction to 3D target detection of lidar point cloud (CenterPoint source code analysis)
3. Implementation of CenterPoint in mmdetection3d
4. Centerpoint complete translation of the original text

This article uses the configuration page :
configs/centerpoint model in the mmdetection3d project:
centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py

1. Data processing part (mainly for point cloud) – train_pipeline process

1. LoadPointsFromFile

1.1 Function: Load point cloud from file.

1.2 Initialization parameters

coord_type: Coordinate system type, optional 'LIDAR', 'DEPTH', 'CAMERA'
load_dim: Data loading dimension, the default is 6, and the nuscenes dataset is set to 5
use_dim: The dimension to be used, the default is [0,1,2], only xyz is used
shift_height: Do you need to use height translation, the default is false
use_color: Do you need to use color features, the default is false
file_client_args: (optional parameter) file client configuration, the default is to use the disk method, that is, given the file path, read it directly.

1.3 Functions within the class (__init__ is omitted for all of the following)

_load_points: load point cloud data
__call__: The callback function to get the point cloud data from the file, the result is a dict containing the point cloud data
__repr__: Returns the module description string

2. LoadPointsFromMultiSweeps

2.1 Function: Load multi-frame point cloud data

2.2 Initialization parameters

sweeps_num: number of frames, default 10
load_dim: default 5
use_dim: default [0, 1, 2, 4], the difference between the 4 dimensions corresponding to the timestamp
time_dim: Timestamp dimension of each point, default 4
file_client_args: Same as above
pad_empty_sweeps: When the point cloud is empty, whether to repeat the keyframe, the default is false
remove_close: remove the closest point, default false
test_mode: If it is true, it will not be randomly sampled, only the nearest N frames will be selected, the default is false

2.3 Functions in a class

_load_points: load point cloud data
_remove_close: Remove all points at a certain radius from the origin
__call__: The callback function to get the point cloud data from the file, the result is a dict containing the point cloud data
__repr__: Returns the module description string

3. LoadAnnotations3D

3.1 Function: Load the 3D annotation box, encapsulate the instance mask and point cloud semantic mask into the associated field.

3.2 Initialization parameters

with_bbox_3d: whether to load 3D box, the default is true
with_label_3d: Whether to load the 3D box label, the default is true
with_attr_label: Whether to load the attribute label, the default is false
with_mask_3d: Whether to load the point cloud 3D instance mask, the default is false
with_seg_3d: Whether to load the point cloud 3D semantic mask, the default is false
with_bbox: whether to load 2D box, default false
with_label: whether to load 2D labels, default false
with_mask: Whether to load the 2D instance mask, the default is false
with_seg: Whether to load the 2D semantic mask, the default is false
with_bbox_depth: whether to load 2.5D box, default false
poly2mask: Whether to convert the polygon to label 二进制掩码, the default is true
seg_3d_dtype: 3D semantic mask type, default int64
file_client_args

3.3 Functions within the class
The specific processes of the following processing functions are carried out in the mmdet3d.CustomDataset class

_load_bboxes_3d: load 3D box annotations, return gt_bboxes_3d、bbox3d_fieldstwo keys
_load_bboxes_depth: Load 2.5D box annotations and return center2d、depthstwo keys
_load_labels_3d: Load label annotation, the returned gt_labels_3dkey
_load_attr_labels: load label attributes, the returned attr_labelskey
_load_masks_3d: Load 3D mask annotations and return pts_instance_mask、pts_mask_fieldstwo keys
_load_semantic_seg_3d: Load 3D semantic segmentation annotations and return pts_semantic_mask、pts_seg_fieldstwo keys
__call__: Return one of multiple types of annotation results
__repr__: Returns the module description string

4. ObjectSample

4.1 Function: Sampling ground truth (gt) objects into data

4.2 Initialization parameters

db_sampler (dict): Configuration of the benchmark data sampler
sample_2d (bool): Whether to copy the patch of the 2D image to the image, if it is a multi-modal cut-paste, it should be set to true, the default is false
use_ground_plane (bool): Whether to use the ground plane to adjust the 3d label, default false

4.3 Functions in a class

remove_points_in_boxes (static function): Remove the point cloud in the sampled bbox
__call__: Sampling the real-value object into the data, and the returned result contains gt_bboxes_3d、gt_labels_3d、pointsthree keys
__repr__: Returns the module description string

5. GlobalRotScaleTrans

5.1 Function: Rotate, zoom and translate in the global coordinate system for 3D scene applications

5.2 Initialization parameters

rot_range (list[float]): Rotation angle range, default [-0.78539816, 0.78539816] (close to [-pi/4, pi/4])
scale_ratio_range (list[float]): scaling factor, default [0.95, 1.05]
translation_std (list[float]): Labeling difference of translation noise, a scene is randomly translated by adding noise, the noise method is sampled from a Gaussian distribution, the default is [0, 0, 0]
shift_height (bool): Whether to shift the height value, the default is false

5.3 Functions in a class

_trans_bbox_points: Translate bbox and point cloud
_rot_bbox_points: Rotate bbox and point cloud
_scale_bbox_points: scale bbox and point cloud
_random_scale: Randomly set the scale factor
update_transform: update the transformation matrix
__call__: rotate, scale, translate bbox and point cloud
__repr__：

6. RandomFlip3D

6.1 Function: Randomly flip point cloud and bbox.
Note : If the "flip" keyword is included in the input dictionary, the flag will be used. Otherwise, it will be randomly determined by the ratio specified in init.

6.2 Initialization parameters

sync_2d (bool, optional): Whether to apply flipping to 2D images. If true, apply relative flipping in 3D to the 2D image, if false, decide whether to randomly rotate the 2D image independently. Defaults to true.
flip_ratio_bev_horizontal (float, optional): Horizontal flip ratio, default 0.0
flip_ratio_bev_vertical (float, optional): vertical flip ratio, default 0.0
**kwargs: mutable key-value pair parameter

6.3 Functions in a class

random_flip_data_3d: random flip 3d data
update_transform: update the transformation matrix
__call__: Flip the point cloud (bbox3d_fields), while flipping the 2D image and its annotations
__repr__

7. PointsRangeFilter

7.1 Function: Filter point cloud by range

7.2 Initialization parameters

point_cloud_range (list[float]): point cloud range

7.3 Functions in a class

__call__: filter point cloud by range
__repr__

8. ObjectRangeFilter

8.1 Function: Filter obstacles by range

8.2 Initialization parameters

point_cloud_range (list[float]): point cloud range

8.3 Functions in a class

__call__: filter point cloud by range
__repr__

9. ObjectNameFilter

9.1 Function: filter true value obstacles according to category name

9.2 Initialization parameters

classes (list[str]): A list of class names that need to be retained for training

9.3 Functions in a class

__call__: filter obstacles by name
__repr__

10. PointShuffle

10.1 Function: Shuffle the order of the input point cloud (shuffling)

10.2 Initialization parameters: none

10.3 Functions in a class

__call__: Disrupt point cloud sorting
__repr__

11. DefaultFormatBundle3D

11.1 Function: Format and pack 3D information by default

Note : It simplifies the pipeline for formatting common fields of voxels, mainly including "proposals", "gt_bboxes", "gt_labels", "gt_masks" and "gt_semantic_seg". The field type conversion is as follows:

    - img: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
    - proposals: (1)to tensor, (2)to DataContainer
    - gt_bboxes: (1)to tensor, (2)to DataContainer
    - gt_bboxes_ignore: (1)to tensor, (2)to DataContainer
    - gt_labels: (1)to tensor, (2)to DataContainer

11.2 Initialization parameters

class_names: list of classes
with_gt(bool): Whether to use the true value, the default is true
with_label(bool): whether to use labels, default true

11.3 Functions in a class

__call__: Perform transformation and formatting of generic fields, put [results]in
__repr__: returns a string describing the module

12. Collect3D

12.1 Function: Collect data from dataloader about specific tasks

Note :
1) This type is usually used as a data loader pipeline 最后阶段, a typical set of keywords, such as "img", "proposals", "gt_bboxes", "gt_bboxes_ignore", "gt_labels", "gt_masks"
2) img_metaalways inserted The data item, the content depends on meta_keys, and the default contains:

- 'img_shape': 网络中输入图像尺寸，类型 tuple ，维度(h, w, c). 注意图像在
          右/下可能是0扩充（pad）
- 'scale_factor': 预处理尺度
- 'flip': 图像是否翻转
- 'filename': 图像文件名路径
- 'ori_shape': 图像原始形状，tuple (h, w, c)
- 'pad_shape': 扩充后图像尺寸
- 'lidar2img': lidar 到 图像旋转矩阵
- 'depth2img': depth 到图像旋转矩阵
- 'cam2img': 相机坐标系到图像坐标洗旋转矩阵
- 'pcd_horizontal_flip': 点云是否水平翻转
- 'pcd_vertical_flip': 点云是否垂直翻转
- 'box_mode_3d': 3D box 模式
- 'box_type_3d': 3D box 类型
- 'img_norm_cfg': 正则化信息dict
    - mean: 每个channel平均值
    - std: 每个channel标准差
    - to_rgb: 是否由bgr转换为rgb
- 'pcd_trans': 点云旋转矩阵
- 'sample_idx': 样本关键帧索引
- 'pcd_scale_factor': 点云尺度因子
- 'pcd_rotation': 点云旋转
- 'pts_filename': 点云文件路径名

12.2 Initialization parameters

keys (Sequence[str]): collected keywords
meta_keys (Sequence[str], optional): The main key, used to convert to mmcv.DataContainer, stored in data[img_metas].

12.3 Inner-class functions

__call__: Collect keywords in results
__repr__: returns a string describing the module

To be continued, interpretation of CenterPoint source code process (2)