Classic literature reading--R-PCC (point cloud compression method based on distance image)

0. Introduction

For lidar data, although it can provide accurate object depth information compared with 2D images, it also has the problem of large data volume, which is not convenient for data storage or transmission. When we get offline data analysis, we will find that it is difficult for us to get laser data for a long period of time, which makes it more difficult for us to reproduce the problem. In this paper, " R-PCC: A Baseline for Range Image-based Point Cloud Compression ", a distance image-based point cloud compression method R-PCC is proposed, which can reconstruct point clouds with uniform or non-uniform precision loss. We segment the original large-scale point cloud into small and compact regions for spatial redundancy and salient region classification. Compared with other voxel-based or image-based compression methods, this method can preserve and align all points in the original point cloud in the reconstructed point cloud. It can also control the maximum reconstruction error of each point through the quantization module. In experiments, we demonstrate that our simpler FPS-based segmentation method can achieve better performance than instance-based segmentation methods such as DBSCAN. The corresponding code has been open sourced on Github .

1. Article contribution

In this paper, we propose a region-based approach using furthest point sampling (FPS). In Section IV-C, we compare the compression ratio and reconstruction quality of instance-based and region-based segmentation methods, and show that semantic and accurate segmentation cannot improve the overall compression performance, while our unified compression framework can achieve 2cm In the case of chamfering distance error, a 30 times compression ratio is achieved. Another reason for segmenting large-scale point clouds into small regions is that we can reduce the compressed bitstream size by maintaining high compression accuracy in important regions and reducing compression accuracy in unimportant regions without affecting downstream tasks . The following are the main contributions:

  • We evaluate the relationship between different ranges and distributions and the compression ratio, and the results show that our furthest point sampling segmentation and point-plane hybrid modeling method is more efficient and effective than the cluster-based compression method.
  • We propose a unified and non-uniform compression framework for different needs. Clusters with more keypoints serve as salient regions for subsequent tasks to maintain high reconstruction quality.
  • We compare our compression framework with other state-of-the-art algorithms, achieving superior performance in both reconstruction quality and subsequent task performance. Our real-time framework, R-PCC, is open source, easily extensible to multiple downstream tasks, and can become a new benchmark for range image-based point cloud compression.

2. Basics of Point Cloud Compression

The point cloud data is very large and needs to be compressed. There are two platforms for compressing point cloud: MPEG’s geometry-based point cloud compression (Geometry Point Cloud Compression, G-PCC) and video-based point cloud compression (Video Point Cloud Compression, V-PCC) in the world; domestically, there is AVS The AVS-PCC platform. V-PCC aims to provide low-complexity decoding capabilities for applications that require real-time decoding, such as virtual/augmented reality, immersive communication, etc. G-PCC provides efficient lossless and lossy compression for the deployment of autonomous driving, 3D maps, and other applications utilizing LiDAR-generated point clouds (or similar content).

The overall framework of V-PCC is shown in the figure below. Similar to traditional 3D video, the overall encoding process can be divided into four steps: patch generation, geometry/texture image generation, additional data compression, and video compression. The video compression process can be compressed using the published video coding standards H.265/HEVC and H.266/VVC.

insert image description here

2.1 Classification of compression methods

Traditional method : first remove part of the redundancy, then use transformation and quantization to transform the point cloud in the spatial domain to the frequency domain and compress the transformation coefficients, and then further compress the bit stream through entropy coding. Advantages: Simple, intuitive, easy to understand, controllable and easy to debug; Disadvantages: Difficult to model semantics, unfriendly to users

In-depth method : Encode point cloud data into hidden representations through convolutional neural network, then quantize hidden features, and compress the probability of occurrence of each symbol in the case of context input based on learning entropy model and entropy coding to generate a bit stream. Advantages: Simple and efficient, data-driven; Disadvantages: Unexplainable, uncontrollable black box, requires hardware support (GPU/FPGA etc.), difficult to find a job because of low threshold

3. System overview

Our proposed unified or non-uniform point cloud compression framework R-PCC is shown in Fig. 1. The decompression part of our framework uses the same basic compressor as the compression framework to decompress the segmentation and modeling information data (information data) and quantized residual data. The informative data can predict the coarse point cloud like a compression framework, while the residuals are recovered by the dequantization module. In the non-uniform framework, the precision of each cluster corresponds to a quantization module in compression.

Our proposed compression framework based on lidar acquisition range images is shown in Figure 1. The decompression part of the framework uses the same basic compressor as in the compression framework to decompress the segmentation and modeling information data (info.data) and quantized residual data. info.data can predict a rough point cloud as in the compression framework, while the residuals are recovered by an inverse quantization module. In the non-uniform framework, the accuracy of each cluster corresponds to a quantized module in compression.
insert image description here
Such an error loss consists of two parts:

  1. Projection from point cloud to range image;
  2. Uniform or non-uniform quantization precision.

4. Distance image

Today, single-frame point clouds for most lidars can be projected from 3D to 2D. LiDAR has different laser beams (e.g. Velodyne HDL-64E has 64 lasers and 32E has 32 lasers), all laser beams have 360° 360° in azimuth direction (horizontal field of view )360° full rotation. Here we take Velodyne HDL-64E as an example. In the height direction (vertical field of view), the range image consists of 64 lines whose angles are distributed at the lowest angleϕmin ϕminϕ min and highest angleϕ max ϕmaxϕ max x . Each scan represents a fixed angle in the range image. If lidar hasHHH- beam laser, horizontal angular resolutionρ ρρ , then the shape of the range image collected by lidar should be[ H , W ] = [ H , ⌊ 360 / ρ ⌉ ] [H,W]=[H,\lfloor360/ρ\rceil][H,W]=[H,360/ ρ ⌉] , where⌊ ⌉ \lfloor \rceil indicates a rounding operation.

We can take the 3D point P = ( x , y , z ) P=(x,y,z)P=(x,y,z ) is projected to the corresponding 2D pixelI = ( w , h , r ) I=(w,h,r)I=(w,h,r ) , wherewww andhhh is the vertical and horizontal index,rrr is the Euclidean distance from the point to the origin of the LiDAR. p ( w , h , r ) p(w,h,r)p(w,h,The value of r ) is calculated according to . r = x 2 + y 2 + z 2 r = \sqrt{x^2 + y^2 + z^2}r=x2+y2+z2 h = ⌊ θ / ρ ⌉ h = \lfloorθ/ρ\rceilh=θ / ρ , andw = ⌊ ( ϕ − ϕ min ) / σ ⌉ w = \lfloor(ϕ−ϕmin)/σ\rceilw=⌊( ϕϕ min ) / σ , whereθ = arctan ( y / x ) θ = arctan ( y / x )i=arc tan ( y / x ) andϕ = arctan ( z / r ) ϕ = arctan ( z / r )ϕ=a rc t an ( z / r ) are horizontal angle and vertical angle respectively,ϕ min ϕ_{min}ϕminis the smallest vertical angle, σ = W / ( ϕ max − ϕ min ) σ=W/(ϕ_{max}−ϕ_{min})p=W / ( ϕmaxϕmin)

5. Compression frame

Ground extraction module: Ground points have strong regularity, because ground points can be fitted to a large plane. We estimate the ground model using a RANSAC plane fitting method like in [8].

Segmentation module: This module splits the point cloud into several denser subsets of the point cloud. Compared with the instance segmentation methods in [7] and [8], we choose the FPS method to find the center of each cluster as the region-based segmentation method. The number of clusters is equal to the number of sampling points in the FPS setting. In Sec. IV-C, we compare DBSCAN as a baseline with our segmentation method and show that our method performs better in terms of compression and efficiency.

Modeling module: After obtaining small point cloud clusters, we use two methods, point and plane, to model the points in each cluster. Point modeling methods use the average of point depths, and planar modeling methods use planes estimated by RANSAC to represent points in each cluster. When the number of points in the cluster is less than 30 or the maximum angle between the plane normal vector and the LiDAR scan in the cluster is greater than 75° 75°75° , we will choose the point modeling method. For the point set in the cluster{ P i = ( xi , yi , zi ) } i = 1 k \{P_i =(x_i,y_i,zi)\}^k_{i=1}{ Pi=(xi,yi,z i ) }i=1k, point model: r = 1 k ∑ ∣ ∣ P i ∣ ∣ 2 r = \frac{1}{k}∑||Pi||^2r=k1∣∣Pi2 , plane modelax + by + cy + d = 0 ax + by + cy + d = 0ax+by+cy+d=0 , where{ r } \{r\}{ r } and{ a , b , c , d } \{a,b,c,d\}{ a,b,c,d } are model parameters.

…For details, please refer to Gu Yueju

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/128904658