Introduction
The paper
introduces a general framework for end-to-end optimization of the rate-distortion performance of nonlinear transform codes assuming scalar quantization.
Traditional image compression often achieves approximate reconstructed images through reversible transformation, quantization, entropy coding, and inversion transformation. Among them, entropy coding is lossless coding (such as Huffman, arithmetic coding, etc.), and the purpose of quantization is to generate entropy coding that can process discrete signals.
Traditional methods often optimize each part separately. This article proposes an end-to-end optimization framework in the field of image compression.
introduce
Traditional transform encodings are linear and the overall process is cumbersome. Operations are usually studied and optimized separately for different goals, and then any proposed combination of encoding tools must be empirically verified based on average bitrate and distortion. In recent years, the advantages of this end-to-end architecture have emerged, and joint optimization can be carried out for the entire system.
Based on end-to-end optimization, the authors develop a nonlinear transform coding optimization framework that generalizes the traditional transform coding paradigm.
The analytical transformation y = ga(x; φ) converts the image vector x into the coding domain (hidden layer variable), and quantizes y . Afterwards, the reconstructed image is obtained by returning to the signal domain through synthetic transformation , in which both the analytical transformation and the synthetic transformation are differentiable.
The coding rate (rate) is evaluated by the entropy of the discrete probability distribution (entropy represents the amount of information. Intuitively, the discrete distribution after quantization must undergo entropy coding. The greater the probability, the shorter the entropy coding length, and vice versa. Therefore entropy can be measured as the average code length)
distortion (distortion) is measured by the MSE or PSNR between x and . This paper proposes to use perceptual transformation to transform the signal domain into the perceptual domain, providing a better approximation of subjective visual distortion than PSNR.
Optimization framework
In the transform coding framework given above, one seeks to adapt, analyze and synthesize the transforms ga and gs to minimize the rate-distortion function: the first term
represents the discrete entropy of the quantized indicator vector q, and the second term measures the reference image in the perceptual representation The distortion between z and its reconstructed image z.
The optimizability of the goal (backpropagation) relies on the differentiability of the operations in the framework, where the quantization operation is not differentiable, and its derivative is everywhere 0 or infinite. This problem is solved by replacing quantization with adding uniform noise during training and using round quantization during inference.
The scalar quantizer is:
The marginal distribution is as follows:
where is the Dirac function (integral is 1, infinitely high at 0, 0 everywhere else). is the probability of the nth quantization interval ( the area surrounded by the quantization interval curve), * represents convolution, and rect is the uniform distribution on.
Adding uniform noise satisfies , is the same as at integer positions, and provides continuous intermediate values:
Use optimally differentiable entropy instead of discrete entropy . In order to optimize this item, a pair of estimates is also needed . This estimate does not need to be arbitrarily accurate because it is band-limited by the convolution with rect. Here, a parameter-free, piecewise linear function (first-order spline interpolation) is used ) as an entropy model estimate . The overall optimization goals are as follows:
Choice of parameter transformation
In traditional compression, the analysis transformation and synthesis transformation are linear and mutually inverse. Generally speaking, this does not need to be strictly satisfied, as long as the rate-distortion function can be minimized.
The analytical transformation and synthetic transformation use the generalized split normalized GDN and its approximate inverse IGDN respectively, and for the perceptual transformation, the normalized Laplacian pyramid NLP is used.
A. GDN is described as follows:
where H represents linear transformation (this work seems to use full connection, and subsequent work uses convolutional layers as linear decomposition). Analyze the transformation model parameters
B. Obtain IGDN based on a round of fixed-point iteration, which is described as follows:
Model parameters:
(Later, GDN is generally used as a nonlinear component in end-to-end image compression, in which the linear transformation H is replaced by a convolutional layer)
C. NLP is described as follows (little used in subsequent work):
decompose the image using a Laplacian pyramid that subtracts local estimates of the average brightness at multiple scales. Each pyramid coefficient is then divided by the local amplitude estimate (a constant plus a weighted sum of the absolute values of its neighbors). Perceptual quality is assessed by evaluating the norm of the difference between the reference and reconstruction in this perceptual domain. Parameters (constants and weights for amplitudes) were optimized to best fit the perceptual data in the TID2008 database, which includes images corrupted by artifacts arising from block transform compression. This simple distortion measure provides a near-linear fit to human perceptual judgments in the database, outperforming the widely used SSIM and MS-SSIM quality metrics.
Experimental results
Consider using 16×16 DCT, linear transformation (θ and Φ each contain 256×256 filter coefficients), and 16×16 GDN transform for comparison. For MSE and NLP domain norm optimization respectively, two indicators, PSNR and NLP, are used for evaluation. In the image below, for the end-to-end model, each point of the curve corresponds to a model, and different models need to be obtained by adjusting λ. The experimental results are as follows:
refer to hahalidaxin