Accuracy and speed are perfectly balanced, and the latest image segmentation SOTA model is released! ! !

What is the core technology behind the trillion-dollar market supporting film and television portrait matting, medical image analysis, and autonomous driving perception? Then it comes to the most important image segmentation technology. Compared with technologies such as target detection and image classification, image segmentation needs to classify each pixel point, which is irreplaceable in fine image recognition tasks, and is also the key to the key core competitiveness of intelligent vision algorithm engineers!

a44b62963c4fc285c5e7319e1b00d230.png

Figure 1 Image segmentation application

Because of this, excellent algorithms such as DeepLabv3, OCRNet, BiseNetv2, and Fast-SCNN emerge in an endless stream. However, in the process of actual industrial implementation, it is often necessary to comprehensively consider various factors such as hardware performance and accuracy, and the requirements for algorithms are also harsh. Often the industry algorithm will sacrifice the running speed of the algorithm in the case of ensuring high recognition accuracy; otherwise, the pursuit of speed will bring about a large loss of accuracy.

a6d7a35855088b71948ee95377ebee8e.png

Figure 2 Schematic diagram of the balance between speed and accuracy of each algorithm

How to achieve a balance between speed and accuracy at the same time, and meet industry demands with high standards under the current industrial trend of cloud, edge, and terminal multi-scenario collaboration, is the direction that researchers of all sessions are committed to investing in.

PP-LiteSeg is such a SOTA (best in the industry) semantic segmentation model that takes into account both accuracy and speed . It is based on the Cityscapes data set. When the accuracy is mIoU 72.0 on 1080ti , the speed is as high as 273.6 FPS  , (when mIoU 77.5, the FPS is 102.6), surpassing the existing CVPR SOTA model STDC, and truly achieving the SOTA balance of accuracy and speed.

da15c6df055e116b4e0a3e6b0d83f04d.png

Figure 3 PP-LiteSeg accuracy/speed description

There is no evidence, you are welcome to try it directly! (Remember to follow the latest status of Star Collection)

Portal:

https://github.com/PaddlePaddle/PaddleSeg

651c593016df3e1537518fafc971454e.png

What is even more surprising is that PP-LiteSeg not only has excellent evaluation results in open source datasets, but also shows amazing strength in industrial datasets ! For example, in quality inspection and remote sensing scenarios, the accuracy of PP-LiteSeg is the same as that of high-precision, large-volume OCRNet, but the speed is nearly 7 times faster! ! !

da682269dbdea1c9dd49f2dac0e81b7a.png

95e639d62d2664c61b9d564302d29ae7.png

Figure 4 Comparison of PP-LiteSeg and OCRNet recognition in an industrial quality inspection dataset

4d12ccb071a5f1410f248ebca6e8ee0f.png

Figure 4 Comparison of PP-LiteSeg and OCRNet recognition in deepglobe dataset

So why can PP-LiteSeg have such an excellent effect?

PP-LiteSeg proposes three innovative modules: Flexible Decoding Module (FLD), Attention Fusion Module (UAFM), and Simple Pyramid Pooling Module (SPPM). FLD flexibly adjusts the number of channels in the decoding module, and balances the calculation amount of the encoding module and the decoding module, making the whole model more efficient; the UAFM module effectively strengthens the feature representation and improves the accuracy of the model; the SPPM module reduces the intermediate feature map. The number of channels and skip connections are removed, which further improves the model performance.

e97acd97541275702bf1e5e6059c564b.png

Figure 5 PP-LiteSeg model structure and optimization points

Based on the design and improvement of these modules, PP-LiteSeg finally surpasses other methods. When the accuracy is mIoU 72.0 on 1080ti, the speed is as high as 273.6 FPS (when mIoU 77.5, the FPS is 102.6), achieving a SOTA balance of accuracy and speed . For more information on PP-LiteSeg, please refer to:

https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.5/configs/pp_liteseg

In order to let developers have a deeper understanding of the SOTA model of PP-LiteSeg, solve the difficulties of landing applications, and master the core capabilities of industrial practice, the Paddle Team has carefully prepared excellent live broadcast courses!

896a41d3b7e8afedf6344d47f75c8ad7.png

Scan the code to register for the live class

Enter the technical exchange group

At 20:30 on April 26th, a senior engineer from Baidu will give us a detailed introduction to the PP-LiteSeg with balanced precision and speed, dismantling its principle and usage, and even the actual combat of automobile metal gasket defect segmentation , plus live broadcast Interactive Q & A, what are you waiting for! Hurry up and scan the code to get in the car!

96f0b1ff0ed9d3a3f0df3b74a03d7249.png

【Citation description】

figure 1

1. The source of the assisted driving picture is from Baidu Map APP AR navigation screenshot

2. The 3D segmentation dataset is derived from the MRISpineSeg spine dataset

3. The portrait cutout comes from the internal staff of Baidu Flying Paddle

4. The remote sensing images are derived from the intelligent interpretation product of GEOVIS iBrain aerospace big data.

Figure 4: Sample quality inspection data provided by partners

Figure 5: From the deepglobe dataset

END

d250602b9a089ae98060d01a2c400a07.png

Guess you like

Origin blog.csdn.net/qq_29462849/article/details/124358189