Interpretation of CenterPoint source code process (1)
References :
1. Quick reading of the paper – CenterPoint
2. Introduction to 3D target detection of lidar point cloud (CenterPoint source code analysis)
3. Implementation of CenterPoint in mmdetection3d
4. Centerpoint complete translation of the original text
This article uses the configuration page :
configs/centerpoint model in the mmdetection3d project:
centerpoint_02pillar_second_secfpn_4x8_cyclic_20e_nus.py
1. Data processing part (mainly for point cloud) – train_pipeline process
1. LoadPointsFromFile
1.1 Function: Load point cloud from file.
1.2 Initialization parameters
- coord_type: Coordinate system type, optional 'LIDAR', 'DEPTH', 'CAMERA'
- load_dim: Data loading dimension, the default is 6, and the nuscenes dataset is set to 5
- use_dim: The dimension to be used, the default is [0,1,2], only xyz is used
- shift_height: Do you need to use height translation, the default is false
- use_color: Do you need to use color features, the default is false
- file_client_args: (optional parameter) file client configuration, the default is to use the disk method, that is, given the file path, read it directly.
1.3 Functions within the class (__init__ is omitted for all of the following)
- _load_points: load point cloud data
__call__
: The callback function to get the point cloud data from the file, the result is a dict containing the point cloud data__repr__
: Returns the module description string
2. LoadPointsFromMultiSweeps
2.1 Function: Load multi-frame point cloud data
2.2 Initialization parameters
- sweeps_num: number of frames, default 10
- load_dim: default 5
- use_dim: default [0, 1, 2, 4], the difference between the 4 dimensions corresponding to the timestamp
- time_dim: Timestamp dimension of each point, default 4
- file_client_args: Same as above
- pad_empty_sweeps: When the point cloud is empty, whether to repeat the keyframe, the default is false
- remove_close: remove the closest point, default false
- test_mode: If it is true, it will not be randomly sampled, only the nearest N frames will be selected, the default is false
2.3 Functions in a class
- _load_points: load point cloud data
- _remove_close: Remove all points at a certain radius from the origin
__call__
: The callback function to get the point cloud data from the file, the result is a dict containing the point cloud data__repr__
: Returns the module description string
3. LoadAnnotations3D
3.1 Function: Load the 3D annotation box, encapsulate the instance mask and point cloud semantic mask into the associated field.
3.2 Initialization parameters
- with_bbox_3d: whether to load 3D box, the default is true
- with_label_3d: Whether to load the 3D box label, the default is true
- with_attr_label: Whether to load the attribute label, the default is false
- with_mask_3d: Whether to load the point cloud 3D instance mask, the default is false
- with_seg_3d: Whether to load the point cloud 3D semantic mask, the default is false
- with_bbox: whether to load 2D box, default false
- with_label: whether to load 2D labels, default false
- with_mask: Whether to load the 2D instance mask, the default is false
- with_seg: Whether to load the 2D semantic mask, the default is false
- with_bbox_depth: whether to load 2.5D box, default false
- poly2mask: Whether to convert the polygon to label
二进制掩码
, the default is true - seg_3d_dtype: 3D semantic mask type, default int64
- file_client_args
3.3 Functions within the class
The specific processes of the following processing functions are carried out in the mmdet3d.CustomDataset class
- _load_bboxes_3d: load 3D box annotations, return
gt_bboxes_3d、bbox3d_fields
two keys - _load_bboxes_depth: Load 2.5D box annotations and return
center2d、depths
two keys - _load_labels_3d: Load label annotation, the returned
gt_labels_3d
key - _load_attr_labels: load label attributes, the returned
attr_labels
key - _load_masks_3d: Load 3D mask annotations and return
pts_instance_mask、pts_mask_fields
two keys - _load_semantic_seg_3d: Load 3D semantic segmentation annotations and return
pts_semantic_mask、pts_seg_fields
two keys __call__
: Return one of multiple types of annotation results__repr__
: Returns the module description string
4. ObjectSample
4.1 Function: Sampling ground truth (gt) objects into data
4.2 Initialization parameters
- db_sampler (dict): Configuration of the benchmark data sampler
- sample_2d (bool): Whether to copy the patch of the 2D image to the image, if it is a multi-modal cut-paste, it should be set to true, the default is false
- use_ground_plane (bool): Whether to use the ground plane to adjust the 3d label, default false
4.3 Functions in a class
- remove_points_in_boxes (static function): Remove the point cloud in the sampled bbox
__call__
: Sampling the real-value object into the data, and the returned result containsgt_bboxes_3d、gt_labels_3d、points
three keys__repr__
: Returns the module description string
5. GlobalRotScaleTrans
5.1 Function: Rotate, zoom and translate in the global coordinate system for 3D scene applications
5.2 Initialization parameters
- rot_range (list[float]): Rotation angle range, default [-0.78539816, 0.78539816] (close to
[-pi/4, pi/4]
) - scale_ratio_range (list[float]): scaling factor, default [0.95, 1.05]
- translation_std (list[float]): Labeling difference of translation noise, a scene is randomly translated by adding noise, the noise method is sampled from a Gaussian distribution, the default is [0, 0, 0]
- shift_height (bool): Whether to shift the height value, the default is false
5.3 Functions in a class
- _trans_bbox_points: Translate bbox and point cloud
- _rot_bbox_points: Rotate bbox and point cloud
- _scale_bbox_points: scale bbox and point cloud
- _random_scale: Randomly set the scale factor
- update_transform: update the transformation matrix
__call__
: rotate, scale, translate bbox and point cloud__repr__
:
6. RandomFlip3D
6.1 Function: Randomly flip point cloud and bbox.
Note : If the "flip" keyword is included in the input dictionary, the flag will be used. Otherwise, it will be randomly determined by the ratio specified in init.
6.2 Initialization parameters
- sync_2d (bool, optional): Whether to apply flipping to 2D images. If true, apply relative flipping in 3D to the 2D image, if false, decide whether to randomly rotate the 2D image independently. Defaults to true.
- flip_ratio_bev_horizontal (float, optional): Horizontal flip ratio, default 0.0
- flip_ratio_bev_vertical (float, optional): vertical flip ratio, default 0.0
**kwargs
: mutable key-value pair parameter
6.3 Functions in a class
- random_flip_data_3d: random flip 3d data
- update_transform: update the transformation matrix
__call__
: Flip the point cloud (bbox3d_fields), while flipping the 2D image and its annotations__repr__
7. PointsRangeFilter
7.1 Function: Filter point cloud by range
7.2 Initialization parameters
- point_cloud_range (list[float]): point cloud range
7.3 Functions in a class
__call__
: filter point cloud by range__repr__
8. ObjectRangeFilter
8.1 Function: Filter obstacles by range
8.2 Initialization parameters
- point_cloud_range (list[float]): point cloud range
8.3 Functions in a class
__call__
: filter point cloud by range__repr__
9. ObjectNameFilter
9.1 Function: filter true value obstacles according to category name
9.2 Initialization parameters
- classes (list[str]): A list of class names that need to be retained for training
9.3 Functions in a class
__call__
: filter obstacles by name__repr__
10. PointShuffle
10.1 Function: Shuffle the order of the input point cloud (shuffling)
10.2 Initialization parameters: none
10.3 Functions in a class
__call__
: Disrupt point cloud sorting__repr__
11. DefaultFormatBundle3D
11.1 Function: Format and pack 3D information by default
Note : It simplifies the pipeline for formatting common fields of voxels, mainly including "proposals", "gt_bboxes", "gt_labels", "gt_masks" and "gt_semantic_seg". The field type conversion is as follows:
- img: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
- proposals: (1)to tensor, (2)to DataContainer
- gt_bboxes: (1)to tensor, (2)to DataContainer
- gt_bboxes_ignore: (1)to tensor, (2)to DataContainer
- gt_labels: (1)to tensor, (2)to DataContainer
11.2 Initialization parameters
- class_names: list of classes
- with_gt(bool): Whether to use the true value, the default is true
- with_label(bool): whether to use labels, default true
11.3 Functions in a class
__call__
: Perform transformation and formatting of generic fields, put[results]
in__repr__
: returns a string describing the module
12. Collect3D
12.1 Function: Collect data from dataloader about specific tasks
Note :
1) This type is usually used as a data loader pipeline 最后阶段
, a typical set of keywords, such as "img", "proposals", "gt_bboxes", "gt_bboxes_ignore", "gt_labels", "gt_masks"
2) img_meta
always inserted The data item, the content depends on meta_keys
, and the default contains:
- 'img_shape': 网络中输入图像尺寸,类型 tuple ,维度(h, w, c). 注意图像在
右/下可能是0扩充(pad)
- 'scale_factor': 预处理尺度
- 'flip': 图像是否翻转
- 'filename': 图像文件名路径
- 'ori_shape': 图像原始形状,tuple (h, w, c)
- 'pad_shape': 扩充后图像尺寸
- 'lidar2img': lidar 到 图像旋转矩阵
- 'depth2img': depth 到图像旋转矩阵
- 'cam2img': 相机坐标系到图像坐标洗旋转矩阵
- 'pcd_horizontal_flip': 点云是否水平翻转
- 'pcd_vertical_flip': 点云是否垂直翻转
- 'box_mode_3d': 3D box 模式
- 'box_type_3d': 3D box 类型
- 'img_norm_cfg': 正则化信息dict
- mean: 每个channel平均值
- std: 每个channel标准差
- to_rgb: 是否由bgr转换为rgb
- 'pcd_trans': 点云旋转矩阵
- 'sample_idx': 样本关键帧索引
- 'pcd_scale_factor': 点云尺度因子
- 'pcd_rotation': 点云旋转
- 'pts_filename': 点云文件路径名
12.2 Initialization parameters
- keys (Sequence[str]): collected keywords
- meta_keys (Sequence[str], optional): The main key, used to convert to
mmcv.DataContainer
, stored indata[img_metas]
.
12.3 Inner-class functions
__call__
: Collect keywords in results__repr__
: returns a string describing the module
To be continued, interpretation of CenterPoint source code process (2)