Road scene semantic segmentation algorithm

Road scene semantic segmentation algorithm

Input and output interfaces

Input:

(1) about two real-time image captured by the camera video resolution (int int)

(2) two right and left real-time images captured by the camera video format (RGB, YUV, MP4, etc.)

(3) Camera calibration parameters (the center position (x, y) and five distortion

Coefficient (2 radially, tangentially 2, the rib 1), floating point type float)

(4) the camera initialization parameters (camera position and initial three coordinate directions

Rotation angle, vehicle speed, etc. width height, float float)

Output:

(1) the respective boundary areas (floating point type float)

(2) the respective label image type (integer int)

(3) Examples of the number of regions divided (integer int)

(4) three or more fused FIG image (RGB, YUV, MP4, etc.)

(5) from the camera to the respective divided areas (floating point type float)

1.  Function Definition

(1) calculation of each area boundary

(2) the respective label image type

(3) calculate the number of regions divided Examples

(4) three or more fused FIG image (RGB, YUV, MP4, etc.)

(5) calculating the distance camera with the respective divided regions

 

2.  Technology Roadmap program

Autopilot is an important core technology systems - Image segmentation semantics (Semantic image segmentation). Semantic image segmentation as computer vision (Computer vision) in image understanding (Image understanding) is an important part of not only the needs of the industry has become increasingly prominent, while semantic segmentation is also one of the hot contemporary academia.

Unmanned deep into the field of environmental perception of the scene is a difficult and very important issue. Semantic segmentation algorithm as the core technology unmanned vehicle driving, the vehicle-mounted camera or laser radar probe to the image input to the neural network, the background image segmentation computer may automatically categorize to avoid obstacles and other vehicles and pedestrians. And classification of different purposes, semantic segmentation model to be associated with a pixel-dense predictive power.

Throughout the history of the development of semantic segmentation, mainly through the "semantic segmentation DL era", "semantic segmentation DL before the era of" and. From the most simple pixel level "Threshold", based segmentation method of the pixel clusters to "FIG depth study" monopolized "Before Working semantic segmentation of the image described as" flourishing ", which is the main representative of" Normalized cut " and "Grab cut" these two. with the rapid development of deep learning technology, semantic segmentation classic segmentation method based on graph partitioning of this area is gradually sucked into a whirlpool of deep learning.

Semantic road scene segmentation desired target system 1 shown in FIG. For images of the road scene, the image of the object can be divided based on the pixel level, in order to achieve basic data for unmanned intelligent driver assistance or support.

 

                                                                                    

 

 

Example 1. FIG semantic segmentation

 

 

 

 

 

 

 Specific goals semantic segmentation region on the travel lane 2. FIG.

 

Image segmentation based on the application of the semantic above, video frames discretization process, so as to achieve the video processing. Evaluation criteria of VOC, using the formula: IOU = TP / (TP + FP + FN) to assess the model, wherein the IOU and objects divided into categories based on, in accordance with the present system is mainly based on assessment IOU category, category ultimately IOU approaching 80% (CityScapes data set), given the complexity of the corporate data in a night scene, which is difficult to assess the final IOU is estimated to be lower than expected.

On the basis of the standard data set to achieve 80% IOU category on migration study night scene making some minor modifications, and eventually used in processing video frames, the frame rate is now scheduled preliminary 25FPS, but using a different environment may affect the final frame rate.

Reference traditional semantic segmentation algorithm, currently integrated at various depths Learning Network application results in semantic segmentation task, this project intends to adopt a preliminary reproduce, steady improvement in the overall route, from the classic to the FCN SegNet ...... until the latest DeepLab V3 +, individually and verification and improvement of the product.

In most papers, the two divided parts of the network is referred to as an encoder and a decoder. In short, the first part of the information "encoded" as the representative of the input vector compression. Action of the second part (decoder) is the desired output signal is reconstructed. There are many encoders based - neural network decoder structure to realize. FCNs, SegNet, and UNet is the most popular of several.

Most encoders with - decoder architecture is different, Deeplab semantic segmentation provides a different approach. Deeplab proposed for controlling the signal extraction architecture and a multi-scale study context features.

Deeplab on the pre-training ImageNet ResNet obtained as its main feature extraction network. However, it is characterized by multi-scale study adds a new residual block. The last block used ResNet hollow convolution, rather than the conventional convolution. Further, each convolution in the residual block are used to capture a different expansion ratio multiscale context information.

Further, the top of the residual block using pooled void space pyramid ASPP. ASPP using the convolution of different expansion rates to classify the area of ​​any scale.

DeepLabv1-v2 are extracted semantic features to intensive use perforated dividing convolution. However, in order to solve the problem of multi-scale segmentation object, DeepLabv3 design uses a multi-scale apertured convolution cascade or parallel to capture the multi-scale background. DeepLabv3 previously proposed modifications apertured space pyramid pooling module that is used to explore the features of multi-scale convolution, the background image based on the global level characteristics obtained by encoding acquires state-of-art performance, PASCAL VOC-2012 86.9 mIOU.

DeepLabv3 + continue to update on the model architecture for multi-scale integration of information, introduction of semantic segmentation commonly used encoder-decoder. In the encoder-decoder architecture, the introduction of the encoder can control the resolution of the extracted features by convolution hollow balancing accuracy and time-consuming.

Employed Xception semantic segmentation task model, using the ASPP depthwise separable convolution and block decoding, encoder improved - operating speed and robustness of the decoder network, obtaining a new state-of-art performance in PASCAL VOC 2012 dataset, 89.0mIOU.

+ DeepLabv3 model framework as shown below:

 

 

 

Figure 3. Deeplab V3 + model architecture

Considering the semantic segmentation architecture is based on the original continue to enhance the overall development process of the project as shown in Figure 178:

 

 

 

4. The overall system flowchart of FIG.

 

test

1) Test data set is disclosed: in the VOC test data sets and the like COCO give mAP Loss and other data.

2) were tested in independent datasets acquisition, analysis of test results.

integrated

Written wrapper function can be integrated in the FPGA board run correctly.

 

Development Environment Description

 

 

 

Table 1. Development Environment semantic segmentation

 

3.  The key technical parameters and performance indicators

1) real-time: up to 30fps on the GPU and FPGA

2) environmental adaptability: semantic segmentation can preferably be set in the open and independent data collection datasets

Guess you like

Origin www.cnblogs.com/wujianming-110117/p/12481972.html