Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data (Paper reading)

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Xintao Wang, Applied Research Center (ARC), Tencent PCG, ICCV2021, Cited:269, Code, Paper

1 Introduction

Although many attempts have been made in blind super-resolution to recover low-resolution images with unknown and complex degradations, they are still far from solving general real-world degraded images. In this work, we extend the powerful ESRGAN to a practical restoration application (i.e., Real ESRGAN), which is trained using purely synthetic data. Specifically, a high-order degradation modeling procedure is introduced to better simulate complex real-world degradation. We also accounted for ringing and overshoot artifacts that are common during compositing. In addition, we used the U-Net discriminator with spectral normalization to improve the discriminator's ability and stabilize the training dynamics. Extensive comparisons show that it outperforms previous work in visual performance on various real datasets.

2. Holistic thinking

Um. . Use a more complex dataset. The good effect of training must be better than the first-order degradation.

3. Method

Recover general real-world LR images by synthesizing training pairs with more realistic degradation processes. Really complex degradations usually result from complex combinations of different degradation processes, such as imaging systems of cameras, image editing, and Internet transmission. For example, when we take pictures with our mobile phones, the photos may suffer from several degradations, such as camera blur, sensor noise, sharpening artifacts, and JPEG compression. We then make some edits and upload to social media apps, which introduces further compression and unpredictable noise. The above process becomes more complicated when the image is shared multiple times on the Internet. This motivates us to extend the classical "first-order" degradation model to "higher-order" degradation modeling of real-world degradation, that is, the degradation is modeled with several repeated degradation processes, each of which is a classical degradation model. Empirically, we employ a second-order degradation process to achieve a good balance between simplicity and effectiveness. Higher-order degradation modeling is more flexible and attempts to simulate the real degradation generation process. We further included a sinc filter in the compositing process to simulate common ringing and overshoot artifacts.

Blind SR aims to recover high-resolution images from low-resolution unknowns and complex degradations. When synthesizing low-resolution inputs, classical degradation models are often employed. Usually, the real image yy is firsty and blur kernelkkk is convolved and then downsampled with a scaling factor. By adding noisennn get low resxxx . Finally, JPEG compression is also employed, since JPEG is widely used in real images.

Blur: We typically model blur degradation as a convolution with a linear blur filter (kernel). Isotropic and anisotropic Gaussian filters are common choices.
Noise: We consider two commonly used noise types: 1) additive Gaussian noise and 2) Poisson noise.
Resize (Downsampling): In sr, downsampling is the basic operation for synthesizing low-resolution images. In general, we consider both downsampling and upsampling, i.e. resizing operations. There are several resizing algorithms - nearest neighbor interpolation, area resizing, bilinear interpolation, bicubic interpolation.
JPEG compression: PEG compression is a commonly used digital image lossy compression technique. It first converts the image to YCbCr color space and downsamples the chroma channel. The image is then divided into 8 × 8 blocks, each block is subjected to a 2D discrete cosine transform (DCT), and then the DCT coefficients are quantized.

3.1 Higher order degradation model

insert image description here

insert image description here

Ringing artifacts often appear as spurious edges near sharp transitions in images. They visually appear as bands or "ghosts". Overshoot artifacts are often combined with ringing artifacts, which appear as increased jumps on edge transitions. The main reason for these artifacts is that the signal is bandwidth limited in the absence of high frequencies. Sharpening or compression usually causes it. sinc sincThe s in c filter cuts off high frequencies for training against synthetic ringing and overshoot artifacts. We have adoptedsinc sincs in c filter: the last step in blurring and compositing. The lastsinc sincThe order of the s in c filter and JPEG compression is randomly swapped to cover a larger degradation space, since some images may be oversharpened first (with overshoot artifacts) and then JPEG compressed.

Guess you like

Origin blog.csdn.net/qq_43800752/article/details/130127671