Environment Perception Algorithm - 3.PSMNet Training Based on Kitti Dataset

1 Introduction

The core idea of ​​PSMNet is to capture feature information of different scales through the pyramid structure, thereby improving the accuracy of disparity estimation. Its highlights are: (1) using a pyramidal convolutional neural (SPP  module) network to extract feature information at different scales; ( 2 ) using cost volume to capture matching information between different perspectives; ( 3 ) introducing Multi-resolution 3D CNN for processing cost volume for depth estimation.

2. Environment configuration

The GitHub link of the PSMNet source project is as follows:
https://github.com/JiaRenChang/PSMNet icon-default.png?t=N4P3https://github.com/JiaRenChang/PSMNet Download the above project and get the PSMNet-master project folder.

The author also gives the .tar model trained on the Kitti dataset:
https://drive.google.com/file/d/1pHWjmhKMG4ffCrpcsp_MTXMJXhgl3kF9/view icon-default.png?t=N4P3https://drive.google.com/file/d/1pHWjmhKMG4ffCrpcsp_MTXMJXhgl3kF9/ view

The name of the above pretrained model is pretrained_model_KITTI2015.tar, and the file size is 21MB.

The author's training environment is Ubuntu20.04+Pytorch1.6, and the hardware is RTX2080Ti (22G video memory).

For GPU driver configuration, please refer to: "Environment Perception Algorithm - 1. Introduction and GPU Driver, CUDA and cudnn Configuration" icon-default.png?t=N4P3https://blog.csdn.net/wenquantongxin/article/details/130858818

1) Create a PSMNet virtual environment

conda create -n PSMNet python=3.7
conda activate PSMNet
conda install pytorch==1.6.0 torchvision==0.7.0 -c pytorch
conda install scikit-image
pip install opencv-python

2) Prepare the dataset

The dataset needs to be organized into two folders data_scene_flow and data_scene_flow_calib. Taking the KITTI stereo 2015 dataset as an example, download the stereo 2015/flow 2015/scene flow 2015 data set (2 GB) and calibration files (1 MB) in the Stereo 2015 dataset from the official website.

The KITTI Vision Benchmark Suite icon-default.png?t=N4P3https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo or download through the following cloud disk link:

https://pan.baidu.com/s/1p3oyglhMvVv0rTE-x-C9ig?pwd=yao1icon-default.png?t=N4P3https://pan.baidu.com/s/1p3oyglhMvVv0rTE-x-C9ig?pwd=yao1

3) Visualize the results of the pre-trained model on the KITTI test set

To avoid the following errors:

TypeError: unsupported operand type(s) for //: 'str' and 'int'

Need to change line 55 of models/stackhourglass.py to

self.maxdisp = int(maxdisp)

Use the following code to test and evaluate the pre-trained model on the Kitti 2015 dataset.

cd .../PSMNet-master
    python submission.py --maxdisp 192 \
    --model stackhourglass \
    --KITTI 2015 \
    --datapath .../data_scene_flow/testing/ \
    --loadmodel .../pretrained_model/pretrained_model_KITTI2015.tar

Please modify the path of the testing data set and the path of the pre-training data set according to the actual situation. Running the above code will generate (a large number of) recognized pseudo-color depth maps in the PSMNet-master folder.

3. Training PSMNet

First, you need to modify the following code in turn:

a) To avoid the following TabError errors:

TabError: inconsistent use of tabs and spaces in indentation

The indentation of finetune.py corresponding to line 158 and below needs to be adjusted:

def main():
    max_acc=0
    max_epo=0
    start_full_time = time.time()

    for epoch in range(1, args.epochs+1):
        total_train_loss = 0
        total_test_loss = 0
        adjust_learning_rate(optimizer,epoch)
           
               ## training ##
        for batch_idx, (imgL_crop, imgR_crop, disp_crop_L) in enumerate(TrainImgLoader):
            start_time = time.time() 

            loss = train(imgL_crop,imgR_crop, disp_crop_L)
            print('Iter %d training loss = %.3f , time = %.2f' %(batch_idx, loss, time.time() - start_time))
            total_train_loss += loss
        print('epoch %d total training loss = %.3f' %(epoch, total_train_loss/len(TrainImgLoader)))
	   
               ## Test ##

        for batch_idx, (imgL, imgR, disp_L) in enumerate(TestImgLoader):
            test_loss = test(imgL,imgR, disp_L)
            print('Iter %d 3-px error in val = %.3f' %(batch_idx, test_loss*100))
            total_test_loss += test_loss


        print('epoch %d total 3-px error in val = %.3f' %(epoch, total_test_loss/len(TestImgLoader)*100))
        if total_test_loss/len(TestImgLoader)*100 > max_acc:
            max_acc = total_test_loss/len(TestImgLoader)*100
        max_epo = epoch
        print('MAX epoch %d total test error = %.3f' %(max_epo, max_acc))

	   #SAVE
        savefilename = args.savemodel+'finetune_'+str(epoch)+'.tar'
        torch.save({
		    'epoch': epoch,
		    'state_dict': model.state_dict(),
		    'train_loss': total_train_loss/len(TrainImgLoader),
		    'test_loss': total_test_loss/len(TestImgLoader)*100,
		}, savefilename)
	
        print('full finetune time = %.2f HR' %((time.time() - start_full_time)/3600))
    print(max_epo)
    print(max_acc)

b) To avoid the following ModuleNotFoundError errors:

ModuleNotFoundError: No module named 'preprocess'

Need to change the code on line 9 of dataloader/KITTILoader.py to

from . import preprocess

c) To avoid the following RuntimeError errors:

RuntimeError: CUDA out of memory. 

It is necessary to modify the batch_size in finetune.py, depending on the GPU memory (Tips: When the training batch_size is set to 2, the GPU requires about 7GB of memory).

#对于Line59修改
batch_size= 2, shuffle= True, num_workers= 8, drop_last=False)
#对于Line63修改
batch_size= 2, shuffle= False, num_workers= 4, drop_last=False)

d) To avoid the following IndexError errors:

IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number

Line 114 of finetune.py needs to be modified:

return loss.item()

e) In order to avoid the following errors during training (usually after the end of epoch 1):

IndexError: index xxx is out of bounds for dimension 1 with size 1

Modify line 131 of finetune.py:

disp_true[index[0][:], index[1][:], index[2][:]] = np.abs(true_disp[index[0][:], index[1][:], index[2][:]]-pred_disp[index[0][:], 0, index[1][:], index[2][:]])

f) Before training, create a new trained folder under the PSMNet-master folder, otherwise the following error will occur:

[Errno 2] No such file or directory: './trained/finetune_1.tar'

After completing the above code modification, you can use the following code to finetune the network:

cd .../PSMNet-master
python finetune.py --maxdisp 192 \
    --model stackhourglass \
    --datatype 2015 \
    --datapath .../data_scene_flow/training/ \
    --epochs 300 \
    --loadmodel .../pretrained_model/pretrained_model_KITTI2015.tar \
    --savemodel ./trained

4. Evaluate the trained PSMNet neural network 

Use the following code to evaluate:

cd .../PSMNet-master
python submission.py --maxdisp 192 \
    --model stackhourglass \
    --KITTI 2015 \
    --datapath .../data_scene_flow/testing/ \
    --loadmodel .../PSMNet-master/trained/finetune_300.tar

Please modify the datapath directory accordingly. The finetune_300.tar shown in the code is the neural network saved after training for 300epochs. The evaluation also generates a large number of pseudocolor depth maps inside the PSMNet-master folder.

(beautifying) coloring can be done by the following code:

# coding=utf-8
import cv2
import numpy as np

# 读取视差图
disparity_map = cv2.imread('000107_10.png', cv2.IMREAD_UNCHANGED)

# 将视差图转换为深度图
depth_map = np.zeros(disparity_map.shape, dtype=np.float32)
invalid_mask = disparity_map == 0
valid_mask = np.logical_not(invalid_mask)
depth_map[valid_mask] = 1.0 / (disparity_map[valid_mask].astype(np.float32) / 255.0)

# 将深度图转换为彩色深度图
depth_map_normalized = cv2.normalize(depth_map, None, 0, 1, cv2.NORM_MINMAX)
depth_map_colored = cv2.applyColorMap(np.uint8(depth_map_normalized * 255), cv2.COLORMAP_JET)

# 显示彩色深度图
cv2.imshow('Colored Depth Map', depth_map_colored)
cv2.waitKey(0)

A more intuitive effect can be obtained.

Some recognition results are as follows:

a) In the No. 156 picture of the test, multiple vehicles on both sides of the street view are effectively recognized, and the volume, position, and shape are correct.

 

b) In the No. 174 test image, pedestrians are effectively recognized, but the depth of the building boundary on the left is not reasonably estimated due to the inconspicuous texture features

 

c) In the No. 132 test image, due to lens distortion in the lower right corner and incomplete shooting of the left and right cameras in the close-up view, there is a small-scale distortion in the prediction, but the overall prediction accuracy is good.

 d) Image No. 186 of the test.

 

5. Summary

The training process of PSMNet requires a lot of time and computing resources. The network structure is relatively complex and needs to be processed on multiple scales; the depth estimation results of the model for areas with less texture or areas with low contrast are not accurate enough, and it is prone to mis-match Phenomenon.
It can be considered to further optimize the network structure, reduce the network parameters and the amount of calculation, and improve the efficiency of training and reasoning; introduce more prior knowledge, and add image data under different lighting conditions through data enhancement during training, so as to improve the model's ability to understand. Adaptability to light changes.

Guess you like

Origin blog.csdn.net/wenquantongxin/article/details/130960853