Be familiar with the mmdetection3d data extraction and model building process from scratch

  • This graphic article starts with the introduction of the configuration file, gradually builds a new configuration file, and builds related models in sequence. Finally, it uses a piece of point cloud data to briefly walk through the processing process.
  • Regarding the installation of mmdetection3d, please refer to the official documentation for installation - MMDetection3D 1.0.0rc4 documentation

1. Read the configuration file

1.1 Composition of mmdetection3d configuration file

Official documentation: Tutorial 1: Learning configuration files — MMDetection3D 1.0.0rc4 documentation

In mmdetection3d, the main idea is to implement a customized model by inheriting the default configuration. Of course, you can also write all the configuration of the model in a file and use it as needed.

The configuration file is stored in the mmdetection3d/config directory. The **_base_ directory is the basic configuration that comes with mmdetection3d, that is, the original configuration. Judging from the composition of the _base_** directory, mmdetection3d divides the configuration files into four types, namely: data Set (dataset), model (model), training strategy (schedule) and runtime default settings (default runtime)

The following is based on part of the content of a configuration file and explains how to read it.

# configs/centerpoint/centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py
_base_ = [
    '../_base_/datasets/nus-3d.py',
    '../_base_/models/centerpoint_01voxel_second_secfpn_nus.py', # 继承了这个模型的基础文件
    '../_base_/schedules/cyclic_20e.py', '../_base_/default_runtime.py'
]
model = dict(
    pts_voxel_layer=dict(point_cloud_range=point_cloud_range),
    pts_bbox_head=dict(bbox_coder=dict(pc_range=point_cloud_range[:2])),
    # model training and testing settings
    train_cfg=dict(pts=dict(point_cloud_range=point_cloud_range)),
    test_cfg=dict(pts=dict(pc_range=point_cloud_range[:2])))

It can be seen that in the file centerpoint_01voxel_second_secfpn_4x8_cyclic_20e_nus.py, the model part only has a small section of content. This is because centerpoint_01voxel_second_secfpn_nus.py is inherited, and only some specific fields are modified or added based on the inherited file.

For convenience of explanation, here is a simplified version of the configuration file

# configs/_base_/models/centerpoint_01voxel_second_secfpn_nus.py
model = dict(
    type='CenterPoint',
    pts_voxel_layer=dict( 
        max_num_points=10, voxel_size=voxel_size, max_voxels=(90000, 120000)),
    pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5),
    pts_middle_encoder=dict(),
    pts_backbone=dict(),
    pts_neck=dict(),
    pts_bbox_head=dict(),
    # model training and testing settings
    train_cfg=dict(),
    test_cfg=dict())

In order to see how the specific network is implemented, we first start from the beginning of the model. According to the configuration file, the first field is type. CenterPoint is used in the above example. We need to add it in mmdetection3d/mmdet3d/models/detectors/ __ init__.py , find CenterPoint and see where it is introduced from, as shown in the figure below. In this way, we find the specific location to implement the network. The path is: mmdetection3d/mmdet3d /models/detectors/centerpoint.py

Insert image description here

Insert image description here

Further down, there is a sentence:python pts_voxel_layer=dict( max_num_points=10, voxel_size=voxel_size, max_voxels=(90000, 120000)),

We find the corresponding initialization field pts_voxel_layer in the __init__ method in mmdetection3d/mmdet3d/models/detectors/centerpoint.py

Insert image description here

But we found here that this field is not used here. This is because the CenterPoint class inherits the MVXTwoStageDetector class. We followed the clues and looked at the MVXTwoStageDetector class. We found that this class uses the pts_voxel_layer field and gives the usage process. Voxelization(**pts_voxel_layer)It is an encapsulated voxelization function that returns a set of parameters that can represent voxels. We don't care about the specific implementation here. In addition, as can be seen from the above figure, all the fields in the init method correspond to the fields in the configuration file. In other words, we can follow the clues from the centerpoint class to find the specific implementation of all configurations.

Insert image description here

At this point, the analysis of the first line is completed, and then there is a sentence pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5). The analysis method is the same as above.

1.2 Use the base configuration file to build your own configuration file

In this part, we inherit the basic configuration file and build a simple configuration file for subsequent use.

  1. First, we create a folder in the configs directory to save our own configuration files, and create a new my_config.py file

  2. In the newly created configuration file, write the following content

    _base_ = [
        '../_base_/datasets/nus-3d-mini.py', # 这里我继承了基础文件中的nus-3d.py构建了一个mini版本,主要就是修改了一下数据集路径
        '../_base_/schedules/schedule_2x.py',
        '../_base_/default_runtime.py',
    ]
    voxel_size = [0.1, 0.1, 0.1]
    norm_cfg = None
    DOUBLE_FLIP = False
    # 为了简单演示,这里只实现了体素构造层和编码层
    model = dict(
        type="MY_MODEL",
        voxel_layer=dict(
            max_num_points=32,
            point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
            voxel_size=voxel_size,
            max_voxels=(16000, 40000)),
        voxel_encoder=dict(
            type='VoxelFeatureExtractorV3',
            num_input_features=4
        ),
        train_cfg=dict(),
        test_cfg=dict())
    data = dict(
        samples_per_gpu=1,
        workers_per_gpu=4
    )
    

1.3 Build a network based on configuration files

The next thing to do is to start building the network based on the model part in our configuration file. The specific steps are as follows:

  • Create a py file in the mmdet3d/models/detectors directory, named here my_model.py

  • Construct a class. The class name must be consistent with the type in the configuration file. Of course, you can also replace it with import... as... when registering :

    from ..builder import DETECTORS # 引入构造器
    
    @DETECTORS.register_module() # 注册,这一句必须要有
    class MY_MODEL():
        def __init__(self):
            pass
    
  • Register in mmdet3d/models/detectors/_ init _.py :

    Insert image description here

  • In this example, for ease of understanding, we will not inherit any files and only write the initialization method.

  • The next thing to do is to define relevant parameters in the init method and give the corresponding implementation.

    from mmcv.ops import Voxelization # 引入mmcv中的体素化方法
    from .. import builder	# 引入构造器
    from ..builder import DETECTORS
    
    @DETECTORS.register_module()
    class my_model():
        def __init__(self, voxel_layer, voxel_encoder, train_cfg, test_cfg):
            self.voxel_layer = Voxelization(**voxel_layer) # 这一层是mmcv自带的,在3.4中会再介绍一下
            self.voxel_encoder = builder.build_voxel_encoder(voxel_encoder) # 这里表示这个层是需要我们自己构造的
    
    
  • The next step is to implement our voxel_encoder layer. We can create a new file in the mmdet3d/models/voxel_encoders directory, or write it directly into an existing file. Here I wrote it under the voxel_encoder.py file.

    @VOXEL_ENCODERS.register_module() # 注册为体素编码层
    class VoxelFeatureExtractorV3(nn.Module):
        def __init__(
                self, num_input_features=4, norm_cfg=None, name="VoxelFeatureExtractorV3"
        ):
            super(VoxelFeatureExtractorV3, self).__init__()
            self.name = name
            self.num_input_features = num_input_features
    
        def forward(self, features, num_voxels, coors=None):
            """
            	features: 输入的体素
            	num_voxels: 体素数目
            """
            points_mean = features[:, :, : self.num_input_features].sum(
                dim=1, keepdim=False
            ) / num_voxels.type_as(features).view(-1, 1)
    
            return points_mean.contiguous()
    
  • The next step is to introduce the written VoxelFeatureExtractorV3 in the mmdet3d/models/voxel_encoders/__init__.pyvoxel_encoder=dict(type='VoxelFeatureExtractorV3', num_input_features=4) file. In this way, we can use it to call our voxel encoding module in the configuration file.
    Insert image description here

  • That’s all about building a network based on the model part of the configuration file. If I have time, I will add more details.

2. Build a model

In this part, we use jupyter notebook to gradually and decompose data capture, and demonstrate the running process of data in the network we built.

2.1 Read configuration file

In the actual training process, the entire parameters are imported according to the configuration file path through the incoming parameters. The relevant code is located in tools/train.py. For simplicity here, we directly use the path to read the configuration file

# 读取配置文件
from mmcv import Config

config_file = "/home/wistful/work/mmdetection3d/configs/my_config/my_config.py"
cfg = Config.fromfile(config_file)
print("cfg type:",type(cfg))
print("cfg.model type:",type(cfg.model))
cfg.model  # 打印模型部分

image-20230301104254130

It can be seen that the printed model structure is the same as that in our configuration file. Among them, the data types of cfg, cfg.model, etc. are not introduced here.

2.2 Reading data

# 取数据
from mmdet3d.datasets import build_dataset

datasets = [build_dataset(cfg.data.train)]

print("datastes type:", type(datasets))
print("datastes[0] type", type(datasets[0]))
print("datastes[0][0] type", type(datasets[0][0]))

datasets[0][0].keys()

Insert image description here

Here, I won’t explain the relevant content anymore. You only need to understand that datasets is a list of length 1, datasets[0] is a nuscenes dataset type, datasets[0][i] is all the contents of the nuscenes dataset, each item Contains four parts: 'img_metas', 'points', 'gt_bboxes_3d', 'gt_labels_3d'

In fact, during the actual training or testing process, a data_loader iterator is also needed to facilitate us to read data in multiple threads , and can realize batch and shuffle reading, etc. mmdet has already helped us implement it. Here we only need If a piece of data is needed to simulate the process, data_loader will not be constructed.

2.3 Constructing the model

# 构建模型
from mmdet3d.models import build_model
model = build_model(
    cfg.model,
    train_cfg=cfg.get('train_cfg'),
    test_cfg=cfg.get('test_cfg')
)
model

Insert dsda image description here

3. Running process

3.1 voxel_layer: point cloud -> voxel

# 示例文件配置中,第一步是voxel_layer层,将点云编码为体素

voxel_layer = model.voxel_layer
# 取点云数据
points = datasets[0][0].get('points').data
# 将点云数据送入 voxel_layer
voxels_out, coors_out, num_points_per_voxel_out = voxel_layer(points)

In the above code, voxel_layer(points)what is executed is self.voxel_layer = Voxelization(**voxel_layer)the input and output of Voxelization. You can take a closer look at it.

Let's use voxel_layer.parametersthe print parameters again and get the following output:

Insert image description here

Now let’s think about our customized configuration file and put it here again:

voxel_size = [0.1, 0.1, 0.1]
model = dict(
    type="MY_MODEL",
    voxel_layer=dict(
        max_num_points=32,
        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
        voxel_size=voxel_size,
        max_voxels=(16000, 40000)),
    voxel_encoder=dict(
        type='VoxelFeatureExtractorV3',
        num_input_features=4
    ),
    train_cfg=dict(),
    test_cfg=dict())

It can be seen that the parameters passed in by the voxel_layer layer are the contents of our configuration file. According to the input and output of Voxelization mentioned earlier, we can see that the forward part is only missing one points input, which is the point cloud. In the code block at the beginning of this section, voxels_out, coors_out, num_points_per_voxel_out = voxel_layer(points)the point cloud is converted into a voxel operation. Let’s take a look at the shape before and after the output:

Insert image description here

That is to say, for the point cloud within the custom range, according to the custom voxel size [0.1, 0.1, 0.1], each voxel can retain up to 32 points. Finally, 32242 points are converted into 6051 voxels. Each voxel contains A little different, but recorded.

3.2 voxel_encoder: voxel encoding

This layer mainly encodes the output of the previous layer (voxel_layer). Turning up to 1.3, we have given the corresponding implementation. The implementation here is relatively simple, that is, finding the average point in each voxel. Let’s now remember the forward function in the implementation:def forward(self, features, num_voxels, coors=None)

Let’s first print the parameters of this layer

Insert image description here

It is found that there is no output. This is because we have not defined the relevant method. We add a method to the VoxelFeatureExtractorV3 class:

    def __repr__(self):
        s = self.__class__.__name__ + '('
        s += 'num_input_features=' + str(self.num_input_features)
        s += ')'
        return s

There will be output when executed again, and the parameters are the same as those in the configuration file. The following code passes the output of the previous layer to this layer

import torch

voxel_encoder = model.voxel_encoder
print(voxel_encoder.parameters)
voxel_encoder_inputs = voxels_out  # 将上一层的输出作为输入
num_voxels = torch.tensor(voxels_out.shape[0])  # 这里只用一条数据作为演示,所以要转一下tensor
voxel_encoder_result = voxel_encoder(voxel_encoder_inputs, num_voxels)
print("voxel_encoder output shape:", voxel_encoder_result.shape)

Insert image description here

Our configuration file and network only provide the definition and implementation of two basic layers. The remaining layers (neck, backbone...) are all similar and follow the same process. The complete process will also include the calculation of the loss function. Reverse update, etc. This step is written in the forward of the model and will not be explained in detail in this article.

Guess you like

Origin blog.csdn.net/u014295602/article/details/129283630