End-to-end optimization of nonlinear transform codes for perceptual quality

Introduction

The paper
    introduces a general framework for end-to-end optimization of the rate-distortion performance of nonlinear transform codes assuming scalar quantization.
    Traditional image compression often achieves approximate reconstructed images through reversible transformation, quantization, entropy coding, and inversion transformation. Among them, entropy coding is lossless coding (such as Huffman, arithmetic coding, etc.), and the purpose of quantization is to generate entropy coding that can process discrete signals.
   Traditional methods often optimize each part separately. This article proposes an end-to-end optimization framework in the field of image compression.

introduce

   Traditional transform encodings are linear and the overall process is cumbersome. Operations are usually studied and optimized separately for different goals, and then any proposed combination of encoding tools must be empirically verified based on average bitrate and distortion. In recent years, the advantages of this end-to-end architecture have emerged, and joint optimization can be carried out for the entire system.
   Based on end-to-end optimization, the authors develop a nonlinear transform coding optimization framework that generalizes the traditional transform coding paradigm.

Insert image description here
   The analytical transformation y = ga(x; φ) converts the image vector x into the coding domain (hidden layer variable), and quantizes y Insert image description here. Afterwards, the reconstructed image is obtained by returning to the signal domain through synthetic transformation Insert image description here, in which both the analytical transformation and the synthetic transformation are differentiable.
   The coding rate (rate) Insert image description hereis evaluated by the entropy of the discrete probability distribution (entropy represents the amount of information. Intuitively, the discrete distribution after quantization must undergo entropy coding. The greater the probability, the shorter the entropy coding length, and vice versa. Therefore entropy Insert image description herecan be measured as the average code length)
   distortion (distortion) \hat{x}is measured by the MSE or PSNR between x and . This paper proposes to use perceptual transformation Insert image description hereto transform the signal domain into the perceptual domain, providing a better approximation of subjective visual distortion than PSNR.

Optimization framework

   In the transform coding framework given above, one seeks to adapt, analyze and synthesize the transforms ga and gs to minimize the rate-distortion function: the first term Insert image description here
   represents the discrete entropy of the quantized indicator vector q, and the second term measures the reference image in the perceptual representation The distortion between z and its reconstructed image z.
   The optimizability of the goal (backpropagation) relies on the differentiability of the operations in the framework, where the quantization operation is not differentiable, and its derivative is everywhere 0 or infinite. This problem is solved by replacing quantization with adding uniform noise during training and using round quantization during inference.
   The scalar quantizer is:
Insert image description here   The marginal distribution is as follows:
Insert image description here
Insert image description here
Insert image description here
where \deltais the Dirac function (integral is 1, infinitely high at 0, 0 everywhere else). P_{q_i}is the probability of the nth quantization interval ( p_{y_i}the area surrounded by the quantization interval curve), * represents convolution, and rect is (-\frac{1}{2}, \frac{1}{2})the uniform distribution on.
   Adding y_iuniform noise Insert image description heresatisfies p_{\tilde{y}_i} \sim p_{y_i}, p_{\tilde{y}_i}is P_{q_i}the same as at integer positions, and provides continuous intermediate values:

Insert image description here
   Use optimally differentiable entropy Insert image description hereinstead of discrete entropy Insert image description here. In order to optimize this item, a pair of estimates is also needed Insert image description here. This estimate does not need to be arbitrarily accurate because Insert image description hereit is band-limited by the convolution with rect. Here, a parameter-free, piecewise linear function (first-order spline interpolation) is used ) as an entropy model estimate Insert image description here. The overall optimization goals are as follows:
Insert image description here

Choice of parameter transformation

   In traditional compression, the analysis transformation and synthesis transformation are linear and mutually inverse. Generally speaking, this does not need to be strictly satisfied, as long as the rate-distortion function can be minimized.
   The analytical transformation and synthetic transformation use the generalized split normalized GDN and its approximate inverse IGDN respectively, and for the perceptual transformation, the normalized Laplacian pyramid NLP is used.
   A. GDN is described as follows:
Insert image description here
   where H represents linear transformation (this work seems to use full connection, and subsequent work uses convolutional layers as linear decomposition). Analyze the transformation model parameters Insert image description here
   B. Obtain IGDN based on a round of fixed-point iteration, which is described as follows:

Insert image description here
Model parameters:
Insert image description here

(Later, GDN is generally used as a nonlinear component in end-to-end image compression, in which the linear transformation H is replaced by a convolutional layer)

   C. NLP is described as follows (little used in subsequent work):
   decompose the image using a Laplacian pyramid that subtracts local estimates of the average brightness at multiple scales. Each pyramid coefficient is then divided by the local amplitude estimate (a constant plus a weighted sum of the absolute values ​​of its neighbors). Perceptual quality is assessed by evaluating the norm of the difference between the reference and reconstruction in this perceptual domain. Parameters (constants and weights for amplitudes) were optimized to best fit the perceptual data in the TID2008 database, which includes images corrupted by artifacts arising from block transform compression. This simple distortion measure provides a near-linear fit to human perceptual judgments in the database, outperforming the widely used SSIM and MS-SSIM quality metrics.

Experimental results

    Consider using 16×16 DCT, linear transformation (θ and Φ each contain 256×256 filter coefficients), and 16×16 GDN transform for comparison. For MSE and NLP domain norm optimization respectively, two indicators, PSNR and NLP, are used for evaluation. In the image below, for the end-to-end model, each point of the curve corresponds to a model, and different models need to be obtained by adjusting λ. The experimental results are as follows:
Insert image description here
refer to hahalidaxin

Guess you like

Origin blog.csdn.net/officewords/article/details/130304370