Deep Learning Image Compression Technology

In recent years, deep learning has occupied a dominant position in the field of computer vision. Whether in image recognition or super-resolution reproduction, deep learning has become an important technology in image research; now deep learning technology has entered the field of image compression.

This article will share with you how to design image compression algorithms using deep learning convolutional neural network technology.

 

Current main image compression algorithms

 

When it comes to image compression algorithms, the most influential image compression technologies on the market are WebP and BPG .

WebP : A picture file format launched by Google in 2010 that can provide both lossy compression and lossless compression. It uses VP8 as the encoding core and can support lossless and transparent color functions since November 2011. At present, Facebook, Ebay and other websites have adopted this image format.

BPG: An image format launched by Fabrice Bellard, a well-known programmer and author of projects such as ffmpeg and QEMU. It uses HEVC as the encoding core . Under the same volume, the BPG file size is only half that of JPEG. In addition BPG also supports 8-bit and 16-bit channels and so on. Although BPG has a good compression effect , the patent fee of HEVC is very high , so the current market use is relatively small.

In terms of compression effect, BPG is better than WebP, but the patent fee brought by the HEVC core used by BPG prevents it from being widely used in the market. In this case, the application of deep learning to design image compression algorithms came into being.

 

How to Design Image Compression Algorithms Using Deep Learning Technology

 

One of the purposes of designing a compression algorithm through deep learning technology is to design a better compression algorithm than the current commercial image compression, and at the same time, with the help of deep learning technology, a more concise end-to-end algorithm can be designed. In the field of image and video compression, the main deep learning technology used is convolutional neural network (CNN). As shown in Figure 1, like building blocks, a convolutional neural network consists of modules such as convolution, pooling, nonlinear functions, and normalization layers. The final output depends on the application. For example, in the field of face recognition, we can use it to extract a series of features to represent a face picture, and then perform face recognition by comparing the similarities and differences of the features.

Figure 1 Schematic diagram of convolutional neural network

(Source http://blog.csdn.net/hjimce/article/details/47323463)

 

Image compression designed with deep learning

The complete deep learning image compression framework includes several modules such as CNN encoder, quantization, inverse quantization, CNN decoder, entropy coding, codeword estimation, rate-distortion optimization and so on. The role of the encoder is to convert the picture into compressed features, and the decoder is to restore the original picture from the compressed features. The encoder and decoder can be designed and built with modules such as convolution, pooling, and nonlinearity.

(Figure 2 Schematic diagram of image compression using deep learning)

 

How to Judge Image Compression Algorithms

 

Before diving into the technical details, let's take a look at how compression algorithms are judged. There are three important indicators to judge the quality of a compression algorithm: PSNR (Peak Signal to Noise Ratio), BPP (bit per pixel) and MS-SSIM (multi-scale SSIM index) .

We know that any data is stored in the form of bits in the computer, and the more bits required, the greater the storage space occupied. PSNR is used to evaluate the image restoration quality after decoding, BPP is used to represent the number of bits occupied by each pixel in the image, and MS-SSIM value is used to measure the subjective quality of the image. Simply put , PSNR is higher at the same  Rate / BPP.  , the better the recovery quality, the higher the MS-SSIM, and the better the subjective experience.

For example, suppose the size of a picture with a length and width of 768*512 is 1M, use deep learning technology to encode it, and generate compressed feature data including 96*64*192 data units after passing through the encoding network. If each data unit is represented On average, 1 bit is consumed, and 96*64*192 bits are required to encode the entire image. After compression, the number of bits required to encode each pixel is (96*64*192)/(768*512)=3, so the BPP value is 3 bit/pixel, and the compression ratio is 24:3=8:1. This means that a 1M image only needs to consume 0.125M of space after compression. In other words, the space where only 1 photo could be placed before can now be placed in 8.

 

How to do compression with deep learning

 

When it comes to how to use deep learning for compression, let's use the example just now. A three-channel picture with a size of 768*512 is sent to the encoding network, and after forward processing, the compression features occupying 96*64*192 data units will be obtained. Readers with computer knowledge may think that a floating point number, integer number, or binary number can be placed in this data unit. So what type of data should be put in? From the perspective of image restoration and the principle of neural network, if the compressed feature data are all floating point numbers, the restored image quality is the highest. However, a floating-point number occupies 32 bits. According to the calculation formula for the number of bits mentioned above, the formula is (96*64*192*32)/(768*512)=96. After compression, the bits occupied by each pixel change from 24 to 96. , instead of compressing, it increases, which is a bad result,

Obviously floating point numbers are not a good choice .

In order to design a reliable algorithm, we use a technique called quantization . Its purpose is to convert a floating point number to an integer or binary number. The simplest operation is to remove the decimal after the floating point number. After the floating point number becomes an integer, only Occupying 8 bits, it means that each pixel occupies 24 bits. Correspondingly, at the decoding end, inverse quantization technology can be used to restore the transformed feature data into floating-point numbers, such as adding a random decimal to the integer, which can reduce the impact of quantization on the accuracy of the neural network to a certain extent, thereby improving the recovery. image quality.

Even though each data in the compressed feature occupies 1 bit, the 8:1 compression ratio is not a very ideal result in our opinion. How to further optimize the algorithm? Let's look at the calculation formula of BPP. Assuming that each compressed feature data unit occupies 1 bit, the formula can be written as: (96*64*192*1)/(768*512)=3, and the calculation result is 3 bit/pixel. For the purpose of compression, The smaller the BPP, the better. In this formula, the denominator is determined by the image, the part that can be adjusted is in the numerator, and the three numbers 96, 64, and 192 in the numerator are related to the network structure. Obviously, when we design a better network structure, these three numbers will become smaller.

Which modules does that 1 relate to? 1 means that each compressed feature data unit occupies an average of 1 bit, quantization will affect this number, but it is not the only influencing factor, it is also related to rate control and entropy coding. The purpose of rate control is to make the data distribution in the compressed feature data unit as concentrated as possible and the range of values ​​to be as small as possible on the premise of ensuring the quality of image restoration , so that we can further reduce the value of 1 through entropy coding technology. The image compression rate will be further improved.

Using deep learning for video compression can be regarded as an extension on the basis of deep learning image compression. It can combine the spatiotemporal information such as optical flow between frames of video sequences to further reduce the bit rate on the basis of single-frame compression.

 

Advantages of Deep Learning Image Compression

 

Image compression developed by Tuya Technology through deep learning technology - TNG has surpassed WebP and BPG in internal tests. The following figure shows the evaluation results on the kodak24 standard data set, which are the PSNR value and the MS-SSIM value respectively.

Figure 3 and Figure 4 are the evaluation results on the kodak24 standard data set, the upper picture is the PSNR result, and the lower picture is the MS-SSIM result

 

Friends who are familiar with image compression can directly see from the PSNR and MS-SSIM values: the  PSNR and MS-SSIM values ​​of TNG  are significantly higher than those of WebP, jpeg2000 and jpeg; and the PSNR value of TNG is higher than that of BPG in the case of high codewords. And its MS-SSIM value is basically higher than that of BPG.

 

Comparison of TNG and WebP compression effects in the case of low codewords

Figure 5 Figure 6 Comparison of compression effects between TNG and WebP in the case of low codewords, Figure 5 TNG, Figure 6 WebP

 

Compared with TNG, although WebP retains more details, it has more distortion, which is not conducive to post-recovery. TNG adopts the edge-preserving filtering method, which makes it less distorted , and the overall image effect is better than WebP.

 

Comparison of TNG and BPG in the case of high codewords

 

Figure 7 Figure 8 Comparison of TNG and BPG compression effects in the case of high codewords, Figure 7 TNG Figure 8 BPG

 

The above two pictures are the case of high codewords. In the actual test, BPG will have the color distortion shown in the above picture ; on the other hand, TNG will basically not have such distortion. This is because when BPG encodes and compresses pictures , its YUV channels are encoded and decoded separately, resulting in some color difference . However, TNG takes into account the situation of the whole picture when encoding, and adopts the same encoding, which avoids the above situation.

 

Comparison of TNG and BPG in the case of low codewords

 

Figure 9 Figure 10 Comparison of TNG and BPG compression effects in the case of low codewords, Figure 9 TNG Figure 10 BPG

 

In the case of low codewords, problems such as pseudo-contours and block effects appear in BPG compressed pictures, and the continuity of the entire picture is relatively poor; while TNG's picture continuity and object contours are better preserved .

 

The use of image compression can be said to be extremely extensive, from social applications, news clients to games and other fields, it can be said that image compression is required where there are images. Using more advanced image compression technology can help companies that use a lot of images to save a lot of image bandwidth costs , and can help users save image traffic and reduce the time required to load images.

 

Summarize

 

Overall, designing image compression algorithms with the help of deep learning is a very promising but also very challenging technique. Deep learning technology image compression can enable everyone to have a better visual experience in the era of full high-definition screens. At the same time, in the fields of games and spatial image sensing, deep learning image compression technology can help pictures achieve higher resolution and smaller storage space. , so as to provide users with a better visual experience.

 

Here is the test link of TNG: http://www.tucodec.com/picture/index You can test by yourself (it is recommended to test on the PC side). Interested friends can also download the compressed pictures and binary files after the test, download and install the decoder to restore the compressed pictures.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324909823&siteId=291194637