AI digital human: Real-ESRGAN, an image super-resolution model that makes pictures clear

1 Introduction to Real-ESRGAN

1.1 What is Real-ESRGAN?

The full name of Real-ESRGAN is Enhanced Super-Resolution GAN: Enhanced super-resolution confrontation generation network. It is a blind image super-resolution model released by Tencent ARC Lab. Its goal is to develop a practical image/video repair algorithm. Real-ESRGAN is based on ESRGAN. It uses purely synthetic data for training. Think of it as an image/video repair, enlargement tool.

github address: Real-ESRGAN
paper address: Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN currently provides five models, namely realesrgan-x4plus (default), reaesrnet-x4plus, realesrgan-x4plus-anime (optimized for animation illustration images, with smaller size), realesr-animevideov3 (for animation videos) and realesrgan-x4plus-anime-6B, you can choose the appropriate model to use according to the picture or video you want to process.

 1.2 Principle of Real-ESRGAN

(1) Generator:

The same generator (SR network) as ESRGAN [50] is adopted, namely a deep network with multiple residual-residual dense blocks (RRDB), as shown in Fig. 4. We also extend the original ×4ESRGAN architecture to perform super-resolution with ×2 and ×1 scaling factors. Since ESRGAN is a heavy network, we first use pixel unshuffle (the inverse operation of pixel-shuffle [42]) to reduce the spatial size and enlarge the channel size before feeding the input into the main ESRGAN architecture. Therefore, most calculations are performed in a smaller resolution space, which reduces the consumption of GPU memory and computing resources.

Real-ESRGAN adopts the same generator network as ESRGAN. For scaling factors of ×2 and ×1, it first uses pixel unshuffle operations to reduce the spatial size and rearrange the information into the channel dimension

(2) Discriminator: U-Net discriminator with spectral normalization (SN)

Since Real-ESRGAN aims to solve a larger degradation space than ESRGAN, the original discriminator design is no longer applicable in ESRGAN. Specifically, the discriminator in Real-ESRGAN requires greater discriminative power for complex training outputs. In addition to distinguishing global styles, it also needs to generate accurate gradient feedback for local textures. Inspired by [41, 452], we also improve the vgg-style discriminator in ESRGAN to a U-Net design with skip connections (Fig. 6). UNet outputs the ground truth value of each pixel and can provide detailed per-pixel feedback to the generator.
At the same time, the U-Net structure and complex degradation also increase the instability of training. We employ spectral normalization regularization [37] to dynamically stabilize training. Furthermore, we observe that spectral normalization is also beneficial to alleviate overly sharp and annoying artifacts introduced by GAN training. With these adjustments, we are able to train RealESRGAN easily and achieve a good balance of local detail enhancement and artifact suppression.
 

(3) The training process
is divided into two stages. First, we train a psnr-oriented model with L1 loss. The resulting model is named real-esrnet. Then, we use the trained PSNR-oriented model as the initialization of the generator and combine L1 loss, perceptual loss [20] and GAN loss to train real-esrGAN [14, 26, 4].

Ablation experiments
Second-order degradation model. We conduct the ablation study of degradation on Real-ESRNet because Real-ESRNet is more controllable and better reflects the impact of degradation. We replace the second-order process in Real-ESRNet with a classical degradation model to generate training pairs. As shown in Fig. 8 (Top), the model trained with the classic first-order degradation model cannot effectively remove the noise on the wall or the blur in the wheat field, while Real-ESRNet can handle these cases.
 

  • Top: Real-ESRNet results w/ and w/o secondary degradation process.
  • Bottom: Real-ESRNet results w/ and w/ sinfilter. Zoom in for the best view

sinc filters . If sinc filters are not used during training, the recovered results will amplify the ringing and overshooting artifacts present in the input image, as shown in Figure 8 (bottom), especially around text and lines. In contrast, models trained with adaptive filters can remove these artifacts.

(4) SN regularized U-Net discriminator

We first include the vgg-style discriminator and its loss weights using the esrgan setting. But as can be seen from Figure 9, the model cannot restore detailed textures (bricks and bushes), and even brings unpleasant artifacts on the branches of the bushes. Local details can be improved using the U-Net design. However, it introduces unnatural textures and also increases training instability. SN regularization can improve the restored texture while stabilizing the training dynamics.

(5) More complex blur kernel

Generalized Gaussian kernel and platform nucleation are removed in fuzzy synthesis. As shown in Figure 10, on some real samples, the model cannot deblur and restore sharp edges as well as RealESRGAN. However, their differences are limited on most samples, indicating that Gaussian kernels with widely used higher-order degradation processes can already cover large real blur spaces. Since we can still observe slightly better performance, we adopt those more complex blur kernels in Real-ESRGAN.

1.3 Innovation points

  • A new approach to constructing datasets is proposed, using high-order processing to enhance the complexity of reduced-order images.
  • The sinc filter is introduced when constructing the data set, which solves the ringing and overshooting phenomenon in the image.
  • Replace the VGG-discriminator in the original ESRGAN and use the U-Net discriminator to enhance the image's confrontational learning of details.
  • Introduce spectral normalization to stabilize the training instability caused by complex data sets and U-Net discriminator.

2 Real-ESRGAN deployment and operation

2.1 conda installation

For details on the installation and use of annoconda, see: annoconda environment construction

2.2 Construction of the operating environment

git clone https://github.com/xinntao/Real-ESRGAN.git
cd Real-ESRGAN

conda create -n realesgan python=3.9
conda activate realesgan

pip install basicsr==1.4.2
pip install facexlib==0.3.0
pip install gfpgan==1.3.8

pip install -r requirements.txt

python setup.py develop

2.3 Model download

General model address: RealESRGAN_x4plus.pth

Anime model address: RealESRGAN_x4plus_anime_6B.pth

After the download is complete, move to the weights directory, and view the display through the command as follows:

[root@localhost Real-ESRGAN]# ll weights/
总用量 82996
-rw-r--r-- 1 root root       54 7月   6 19:46 README.md
-rw-r--r-- 1 root root 17938799 7月  19 11:15 RealESRGAN_x4plus_anime_6B.pth
-rw-r--r-- 1 root root 67040989 7月   7 15:26 RealESRGAN_x4plus.pth

Create the gfpgan model storage directory:

mkdir -p gfpgan/weights

Download the gfpgan model file and store it in the directory gfpgan/weights created above:

    detection_Resnet50_Final.pth

    parsing_parsenet.pth

    GFPGANv1.3.pth

After completion, the display through the command is as follows:

 [root@localhost Real-ESRGAN]# ll gfpgan/weights/
总用量 530728
-rw-r--r-- 1 root root 109497761 7月   7 15:33 detection_Resnet50_Final.pth
-rw-r--r-- 1 root root 348632874 7月   7 15:33 GFPGANv1.3.pth
-rw-r--r-- 1 root root  85331193 7月   7 15:33 parsing_parsenet.pth

2.4 Modify the code (avoid downloading from the network, downloading from the network is very slow and often fails)

vi Real-ESRGAN/inference_realesrgan.py 
if args.face_enhance:  # Use GFPGAN for face enhancement
        from gfpgan import GFPGANer
        face_enhancer = GFPGANer(
            model_path='https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth',
            upscale=args.outscale,
            arch='clean',
            channel_multiplier=2,
            bg_upsampler=upsampler)
 
修改为:
 
if args.face_enhance:  # Use GFPGAN for face enhancement
        from gfpgan import GFPGANer
        face_enhancer = GFPGANer(
            model_path='./gfpgan/weights/GFPGANv1.3.pth',
            upscale=args.outscale,
            arch='clean',
            channel_multiplier=2,
            bg_upsampler=upsampler)

3 Real-ESRGAN effect display

The processed images are stored in the results directory

[root@localhost Real-ESRGAN]# ll results/
总用量 7908
-rw-r--r-- 1 root root 3029489 7月  19 11:20 00003_out.png
-rw-r--r-- 1 root root  133649 7月  19 11:17 0014_out.jpg
-rw-r--r-- 1 root root 4928934 7月  19 11:22 children-alpha_out.png

3.1 General Image Enhancement

python inference_realesrgan.py -n RealESRGAN_x4plus -i inputs/0014.jpg --face_enhance

Original Image:

 Enhanced image:

 3.2 Animation Image Enhancement

python inference_realesrgan.py -n RealESRGAN_x4plus_anime_6B -i inputs/0014.jpg

Original Image:

 Enhanced effect: 

4 Summary

Real-ESRGAN is an image super-resolution enhancement method based on deep learning, which achieves high-quality image reconstruction through generative adversarial networks. It performs well in preserving details and enhancing image fidelity, and can be widely used in the field of image processing and enhancement. In the process of building AI digital human, Real-ESRGAN is mainly used for image enhancement after voice-driven face, and based on this enhancement technology to build high-definition digital human video.

Guess you like

Origin blog.csdn.net/lsb2002/article/details/131724463