Python implementation of JPEG image compression algorithm

Summary

On the basis of studying the basic principle of JPEG compression coding for image data compression, the article designs the implementation process of the JPEG image compression algorithm program, uses Python language to write the program, and realizes the control of compression quality, and verifies the JPEG compression coding Feasibility of image data compression.
Use JPEG compression software to output the image from the original image and reconstruct it. Through intuitive comparison, it is found that the display and sense of the original image are still very good when using JPEG compression software to compress the image. Through the research on parameters such as the output compression ratio, it is scientifically demonstrated that JPEG compression coding has a huge compression effect on image data and good compression quality.

introduction

The recording and storage of media information such as images is developing towards digitization. And the amount of data of these digitized images is astonishing. These large-capacity data undoubtedly cause great pressure on the memory capacity and the speed of the computer. To solve this problem, if we simply expand the memory capacity, not only will a large amount of image data cause a lot of problems when storing and processing, but also greatly restrict the network due to the limitation of network bandwidth in the process of image data transmission. The development of multimedia technology. With the rapid development of communication network, the data traffic becomes larger and more complex, and it is unrealistic to rely solely on increasing the traffic of communication trunk lines. However, if the amount of information data can be compressed by means of data compression, and stored and transmitted in a compressed form, it will not only save storage space, but also improve the transmission efficiency of the communication trunk line, and at the same time enable the computer to process high-quality audio and video in real time. information. By compressing image data, its most direct effect is that it can reduce the bandwidth required for image transmission, and at the same time, no additional physical equipment or storage capacity is needed, and higher transmission accuracy can be achieved. It can be seen from this that it is absolutely necessary to compress still images.
Image compression coding is to carry out a large number of statistical analyzes on digital images, on the basis of mastering and understanding the statistical characteristics of image information, make full use of the strong correlation characteristics of the image itself, seek to eliminate or reduce correlation or change the probability distribution of image sources Inhomogeneity method to achieve image data compression.

1 Basic introduction of JPEG compression algorithm

1.1 Introduction

JPEG (Joint Photographic Experts Group), the Joint Photographic Experts Group, is a standard for continuous-tone still image compression. The file suffix is ​​.jpg or .jpeg, which is the most commonly used image file format. It mainly uses the joint coding method of predictive coding (DPCM), discrete cosine transform (DCT) and entropy coding to remove redundant images and color data. It is a lossy compression format, which can compress images in a small storage space. Space, to a certain extent, will cause damage to the image data. In particular, using too high a compression ratio will reduce the quality of the restored image after final decompression. If you pursue high-quality images, you should not use an excessively high compression ratio.
JPEG can use lossy compression to remove redundant image data, and use less disk space to get better image quality. Moreover, JPEG is a very flexible format with the function of adjusting image quality. It allows files to be compressed with different compression ratios, supports multiple compression levels, and the compression ratio is usually 10;1 to 40;1. The larger the value, the lower the image quality; conversely, the smaller the compression ratio, the higher the image quality. For the same image, the file stored in JPEG format is 1/10~1/20 of other types of files, usually only tens of KB, and the quality loss is small, which is basically invisible. The JPEG format mainly compresses high-frequency information and retains color information well, making it suitable for use on the Internet; it can reduce image transmission time and supports 24-bit true color; it is also widely used in images that require continuous tone. [1]

1.2 Advantages and disadvantages analysis

1.2.1 Advantages

Supports extremely high compression rate, so the download speed of JPEG images is greatly accelerated;
it can easily process 16.8M colors, and can reproduce full-color images well;
in the process of image compression, this image format can allow free Choose between minimum file size and maximum file size;
relatively small file sizes and fast download speeds are good for transfers where bandwidth is not "rich".

1.2.2 Disadvantages

Not all browsers support inserting various JPEG images into web pages;
when compressed, the quality of the image may be lost, so it is not suitable to use this format to display high-definition images. [2]

2 JPEG compression algorithm steps and analysis

Complete flowchart of JPEG compression algorithm

2.1 Image Segmentation

The first step of the JPEG algorithm is to divide an image into 8X8 segments, so that these segments can be processed later, and these segments can be processed separately later to make the whole compression process more convenient.
Divide the image into small blocks of 8*8 size

2.2 Color space conversion

The so-called "color space" refers to the mathematical model for expressing color. For example, our common "RGB" model is to decompose the color into three components of red, green and blue, so that a picture can be decomposed into three grayscale images. Mathematics In terms of expression, each 8X8 pattern can be expressed as three 8X8 matrices, where the value range is generally between [0,255], as shown in the figure.
8*8 matrix representation of image

Different color models have different application scenarios. For example, the RGB model is suitable for self-illuminating patterns like monitors. In the printing industry, ink is used to print. The color of the pattern is produced by reflecting light, and the CMYK model is usually used. In the JPEG compression algorithm, it is necessary to convert the pattern into a YCbCr model, where Y represents Luminance, and Cb and Cr represent the "color difference values" of green and red, respectively.
The concept of "chromatic aberration" originated from the TV industry. Early TVs were black and white. At that time, when transmitting TV signals, only the luminance signal, that is, the Y signal, was needed. After the appearance of color televisions, people added two color-difference signals to transmit color information in addition to the y signal.
According to the principle of three primary colors, it is found that the brightness contributed by the three colors of red, green and blue is different. The "brightness" of green is the largest, and the blue is the darkest. Let the share of the brightness contributed by red be KR, and the share contributed by blue be KB, then the brightness is:
insert image description here

According to experience, KR=0.299, KB=0.114, then:
insert image description here

The color difference between blue and red is defined as follows:
insert image description here

Finally, the mathematical formula for converting RGB to YCbCr can be obtained as:
insert image description here

When the subsequent python program implements the JPEG compression algorithm, considering that the compression efficiency of calling the numpy library will far exceed that of using the for loop, the matrix expression of the above formula is used.
Why do we add a seemingly irrelevant color space conversion algorithm to the compression algorithm? This is not superfluous. The basic principle of lossy compression is mentioned at the beginning of the article. The first thing to do in lossy compression is to "put the important information and unimportant information”, YCbCr can do just that. Due to the structure of the human eye, the light and dark changes in the image are more easily perceived by the human eye. According to the knowledge of another professional elective course in this semester, there are two kinds of photoreceptor cells in the retina, which are rod cells that can observe light changes, and cone cells that can observe colors. Because the number of rod cells is higher than that of cone cells If there are more, we are more able to observe the details of light and dark changes. For example, the following picture:
Convert JPEG image to YCbCr image

It can be seen that the details of the brightness map are richer. After JPEG converts the image into a YCbCr image, different processing can be done according to the importance of the data. That's why JPEG uses this color space.

2.3 Discrete Cosine Transform

Discrete cosine transform (Discrete cosine transform), referred to as DCT, is the core content of the JPEG algorithm. Discrete cosine transform is another form of Fourier transform. When what is to be processed is no longer a function, but a bunch of discrete data, and if these data are symmetrical, then the function transformed by Fourier only contains cosine term, this transformation is called discrete cosine transformation. For example, there is a set of one-dimensional data [x0,x1,x2,...,xn-1], then n transformation series Fi can be obtained through DCT transformation:
insert image description here

At this time, the original data Xi can be expressed by the inverse transform (IDCT) of the discrete cosine transform:
insert image description here

That is to say, after DCT transformation, an array can be decomposed into the sum of several arrays. If we regard the array as a one-dimensional matrix, then the result can be regarded as the sum of a series of matrices:
insert image description here

In the JPEG compression process, after the conversion of the color space, each 8X8 image block is represented as three 8X8 matrices on the data, and then we perform a two-dimensional DCT transformation on these three matrices, the two-dimensional DCT The conversion formula is:
insert image description here

For example, such as a matrix with all the same values, after DCT transformation, all the series are combined into a new matrix:
insert image description here

It can be seen that after DCT conversion, the matrix is ​​all concentrated on the DC component F(0,0) in the upper left corner, and all other positions become 0. In the actual JPEG compression process, due to the coherence of the image itself, the value in an 8X8 image generally does not have a large jump. After DCT conversion, there will be a similar effect. The DC component in the upper left corner saves a large value. , and the other components are close to 0. Let's take the Y component of the first image in the upper left corner of Lenna as an example. The transformed matrix is:
insert image description here

After the data is changed by DCT, it is obviously divided into two parts: DC component and AC component, which fully paves the way for further compression later, which can be said to be the most important step in the entire JPEG.

2.4 Quantization

After the image has undergone discrete cosine transform, although the image data has been changed beyond recognition, it is still in a reversible state, and has not entered the lossy step of lossy compression. Next, let's take a look at how the details in the data are filtered out. After color space conversion and discrete cosine transform, each 8X8 image block becomes three 8X8 floating-point matrixes, representing Y, Cr, and Cb respectively. data. For example, take one of the brightness data matrix as an example, its data is as follows:
insert image description here

How can these floating point numbers be stored with less space when some precision can be lost? The answer is to use quantization (Quantization), referred to as quantization. The concept of "quantum" comes from physics, which means that continuous energy can be regarded as a combination of individual units. For example, when dealing with the direction of a character's face in a game, floating point numbers such as 0 to 2π are generally not used. Instead, the direction is divided into 16 intervals, represented by integers such as 0 to 16, so that only 4 bits are enough. The quantization algorithm provided by JPEG is as follows:
insert image description here

Among them, the round function is a rounding function, but considering rounding, that is to say:
insert image description here

G is the image matrix that needs to be processed, Q is called the quantization coefficient matrix (Quantization matrices), and the JPEG algorithm provides two standard quantization coefficient matrices, which are used to process the brightness data Y and the color difference data Cr and Cb respectively:
insert image description here

insert image description here

Taking -415.38 in the upper left corner as an example, the corresponding quantization coefficient is 16, then round(-415.38/16)=round(-25.96125)=-26. The final quantized result is:
insert image description here

It can be seen that most of the data has become 0, which is very beneficial for subsequent compressed storage. Lossy compression is to separate the important data from the unimportant data in the data, and then process them separately. The values ​​at different positions in the DCT coefficient matrix represent the components of different frequencies in the image data. The data in these two tables are based on the experience accumulated by people based on the sensitivity of the human eye to different frequencies. Generally speaking, people The eye is more sensitive to low-frequency components than high-frequency components, so the value in the upper left corner of the two quantization coefficient matrices is obviously smaller than that in the lower right corner. In the actual compression process, you can also multiply a coefficient on the basis of these coefficients as needed, so that more or less data becomes 0. When the image processing software we usually use generates a jpg file, the When controlling the compression quality, it is this coefficient that is controlled, and it will also be reflected in the program implementation later.
2.5 ZigZag arrangement
After quantization, the 8*8 matrix is ​​still a two-dimensional matrix. How to adjust the result of our DCT to improve the compression rate? Observing the quantized matrix, we found that a lot of information is concentrated in the upper left corner, so we use ZigZag arrangement:
Schematic diagram of ZigZag arrangement sequence

There is only one purpose for doing this, which is to put 0 together as much as possible. Since most of the 0 are concentrated in the lower right corner, the order from the upper left corner to the lower right corner is changed. After this order transformation, the final matrix becomes a array of integers.
2.6 RLE encoding
Run Length Encoding (Run Length Encoding), also known as "Run Length Encoding" or "Run Length Encoding", is a lossless compression encoding. For example: 5555557777733322221111111, a characteristic of this data is that the same content will appear many times, then a simplified method can be used to record this string of numbers, such as (5, 6) (7, 5) (3, 3 )(2,4)(1,7) is its run-length encoding. The number of bits of the run-length encoding will be much less than the number of bits of the original string. Take an example to see: 57, 45, 0, 0, 0, 0, 23, 0, -30, -16, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ... , 0, can be expressed as: (0, 57); (0, 45); (4, 23); (1, -30); (0, -16); (2, 1); EOB. That is, the first one of each group of numbers represents the number of 0s, and in order to be more conducive to subsequent processing, it must be 4 bits, that is to say, it can only be 0~15, which is a feature of this run-length encoding. JPEG uses the upper 4 bits of a byte to represent the number of consecutive "0", and uses its lower 4 bits to represent the number of bits required to encode the next non-"0" coefficient, followed by The numeric value of the quantized AC coefficient. Among them (0, 0) and (15, 0) are special, (0, 0) represents the end of the block, and (15, 0) represents that there are already 16 consecutive 0s.
Since the homework requirements mentioned that RLE encoding or Huffman encoding can be selected for compression, I only chose RLE encoding in the subsequent program implementation.

3 Demonstration and analysis of results

The quality factor can be input at runtime to control the compression ratio (the higher the compression ratio, the worse the image quality).

The main implementation method is to multiply the JPEG brightness quantization table and chrominance quantization table by a quality factor Q during the quantization step to change the data in the table. When the quality factor Q is 1, it is equivalent to the JPEG standard quantization table. The larger the quality factor, the larger the compression ratio, and the worse the quality; the smaller the quality factor, the smaller the compression ratio, and the better the quality. Take Lenna as an example:
The left is the original image, and the right is the compressed result when the quality factor is 1

The quality factor is 1, which is equivalent to the compression result of the JPEG standard quantization table. The program also calculates a compression ratio of about 8.023013598016913, and a PSNR (peak signal-to-noise ratio) value of about 29.634010562060844 (the higher the PSNR value, the more similar it is to the original image. The higher the PSNR 40dB indicates excellent image quality, 30-40dB usually indicates good image quality, 20-30dB indicates poor image quality, PSNR lower than 20dB image is unacceptable).

As the quality factor increases, the image quality decreases gradually, but the compression ratio also increases gradually.

5 Conclusion

This article briefly analyzes the basic principle and implementation process of the jpeg image compression algorithm, and implements the algorithm through python. When the PSNR value of the image is guaranteed to be above 20, it can achieve a compression rate close to 90% (if Huffman coding is implemented, the compression rate should be have been further improved).
However, compared with the compression of the format factory, there are still many problems in the gap between my algorithm and the original image: the compression rate is much lower than that of the format factory (it takes about 5 seconds for an image of about 500KB, which is already all I can use in the optimization algorithm numpy library instead of the result of the for loop operation), some details are not obvious, and the edge of the image is distorted (especially for the part of the image where there is a sudden change in color, and some low-resolution images, this problem is more serious) and other problems.
Although I encountered many solved and unresolved difficulties in the process, through this big assignment, I have a more in-depth and comprehensive understanding of the principle of the JPEG image compression algorithm, and at the same time greatly improved my hands-on ability. It is believed that with the advancement of technology, there will be more and more various compression methods, the quality will be better and better, and they will be further developed and applied.

references:

[1] Wu Yu. Digital Image Processing: Beijing University of Posts and Telecommunications Press, 2017: 47-48.
[2] Li Xiaoping. Multimedia Technology: Beijing Institute of Technology Press, 2015: 185.
[3] Detailed explanation of JPEG compression principle: https://www.cnblogs.com/Arvin-JIN/p/9133745.html.
[4] Two-dimensional discrete cosine transform 2D -DCT actual combat: https://blog.csdn.net/ahafg/article/details/48808443.
[5] ZigZag transformation acceleration, space for time practice: https://my.oschina.net/tigerBin/blog/1083549.

Guess you like

Origin blog.csdn.net/qq_57435798/article/details/128455628