1 Introduction to Real-ESRGAN
1.1 What is Real-ESRGAN?
The full name of Real-ESRGAN is Enhanced Super-Resolution GAN: Enhanced super-resolution confrontation generation network. It is a blind image super-resolution model released by Tencent ARC Lab. Its goal is to develop a practical image/video repair algorithm. Real-ESRGAN is based on ESRGAN. It uses purely synthetic data for training. Think of it as an image/video repair, enlargement tool.
github address: Real-ESRGAN
paper address: Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Real-ESRGAN currently provides five models, namely realesrgan-x4plus (default), reaesrnet-x4plus, realesrgan-x4plus-anime (optimized for animation illustration images, with smaller size), realesr-animevideov3 (for animation videos) and realesrgan-x4plus-anime-6B, you can choose the appropriate model to use according to the picture or video you want to process.
1.2 Principle of Real-ESRGAN
(1) Generator:
The same generator (SR network) as ESRGAN [50] is adopted, namely a deep network with multiple residual-residual dense blocks (RRDB), as shown in Fig. 4. We also extend the original ×4ESRGAN architecture to perform super-resolution with ×2 and ×1 scaling factors. Since ESRGAN is a heavy network, we first use pixel unshuffle (the inverse operation of pixel-shuffle [42]) to reduce the spatial size and enlarge the channel size before feeding the input into the main ESRGAN architecture. Therefore, most calculations are performed in a smaller resolution space, which reduces the consumption of GPU memory and computing resources.
Real-ESRGAN adopts the same generator network as ESRGAN. For scaling factors of ×2 and ×1, it first uses pixel unshuffle operations to reduce the spatial size and rearrange the information into the channel dimension
(2) Discriminator: U-Net discriminator with spectral normalization (SN)
Since Real-ESRGAN aims to solve a larger degradation space than ESRGAN, the original discriminator design is no longer applicable in ESRGAN. Specifically, the discriminator in Real-ESRGAN requires greater discriminative power for complex training outputs. In addition to distinguishing global styles, it also needs to generate accurate gradient feedback for local textures. Inspired by [41, 452], we also improve the vgg-style discriminator in ESRGAN to a U-Net design with skip connections (Fig. 6). UNet outputs the ground truth value of each pixel and can provide detailed per-pixel feedback to the generator.
At the same time, the U-Net structure and complex degradation also increase the instability of training. We employ spectral normalization regularization [37] to dynamically stabilize training. Furthermore, we observe that spectral normalization is also beneficial to alleviate overly sharp and annoying artifacts introduced by GAN training. With these adjustments, we are able to train RealESRGAN easily and achieve a good balance of local detail enhancement and artifact suppression.
(3) The training process
is divided into two stages. First, we train a psnr-oriented model with L1 loss. The resulting model is named real-esrnet. Then, we use the trained PSNR-oriented model as the initialization of the generator and combine L1 loss, perceptual loss [20] and GAN loss to train real-esrGAN [14, 26, 4].
Ablation experiments
Second-order degradation model. We conduct the ablation study of degradation on Real-ESRNet because Real-ESRNet is more controllable and better reflects the impact of degradation. We replace the second-order process in Real-ESRNet with a classical degradation model to generate training pairs. As shown in Fig. 8 (Top), the model trained with the classic first-order degradation model cannot effectively remove the noise on the wall or the blur in the wheat field, while Real-ESRNet can handle these cases.
- Top: Real-ESRNet results w/ and w/o secondary degradation process.
- Bottom: Real-ESRNet results w/ and w/ sinfilter. Zoom in for the best view
sinc filters . If sinc filters are not used during training, the recovered results will amplify the ringing and overshooting artifacts present in the input image, as shown in Figure 8 (bottom), especially around text and lines. In contrast, models trained with adaptive filters can remove these artifacts.
(4) SN regularized U-Net discriminator
We first include the vgg-style discriminator and its loss weights using the esrgan setting. But as can be seen from Figure 9, the model cannot restore detailed textures (bricks and bushes), and even brings unpleasant artifacts on the branches of the bushes. Local details can be improved using the U-Net design. However, it introduces unnatural textures and also increases training instability. SN regularization can improve the restored texture while stabilizing the training dynamics.
(5) More complex blur kernel
Generalized Gaussian kernel and platform nucleation are removed in fuzzy synthesis. As shown in Figure 10, on some real samples, the model cannot deblur and restore sharp edges as well as RealESRGAN. However, their differences are limited on most samples, indicating that Gaussian kernels with widely used higher-order degradation processes can already cover large real blur spaces. Since we can still observe slightly better performance, we adopt those more complex blur kernels in Real-ESRGAN.
1.3 Innovation points
- A new approach to constructing datasets is proposed, using high-order processing to enhance the complexity of reduced-order images.
- The sinc filter is introduced when constructing the data set, which solves the ringing and overshooting phenomenon in the image.
- Replace the VGG-discriminator in the original ESRGAN and use the U-Net discriminator to enhance the image's confrontational learning of details.
- Introduce spectral normalization to stabilize the training instability caused by complex data sets and U-Net discriminator.
2 Real-ESRGAN deployment and operation
2.1 conda installation
For details on the installation and use of annoconda, see: annoconda environment construction
2.2 Construction of the operating environment
git clone https://github.com/xinntao/Real-ESRGAN.git
cd Real-ESRGAN
conda create -n realesgan python=3.9
conda activate realesgan
pip install basicsr==1.4.2
pip install facexlib==0.3.0
pip install gfpgan==1.3.8
pip install -r requirements.txt
python setup.py develop
2.3 Model download
General model address: RealESRGAN_x4plus.pth
Anime model address: RealESRGAN_x4plus_anime_6B.pth
After the download is complete, move to the weights directory, and view the display through the command as follows:
[root@localhost Real-ESRGAN]# ll weights/
总用量 82996
-rw-r--r-- 1 root root 54 7月 6 19:46 README.md
-rw-r--r-- 1 root root 17938799 7月 19 11:15 RealESRGAN_x4plus_anime_6B.pth
-rw-r--r-- 1 root root 67040989 7月 7 15:26 RealESRGAN_x4plus.pth
Create the gfpgan model storage directory:
mkdir -p gfpgan/weights
Download the gfpgan model file and store it in the directory gfpgan/weights created above:
After completion, the display through the command is as follows:
[root@localhost Real-ESRGAN]# ll gfpgan/weights/
总用量 530728
-rw-r--r-- 1 root root 109497761 7月 7 15:33 detection_Resnet50_Final.pth
-rw-r--r-- 1 root root 348632874 7月 7 15:33 GFPGANv1.3.pth
-rw-r--r-- 1 root root 85331193 7月 7 15:33 parsing_parsenet.pth
2.4 Modify the code (avoid downloading from the network, downloading from the network is very slow and often fails)
vi Real-ESRGAN/inference_realesrgan.py
if args.face_enhance: # Use GFPGAN for face enhancement
from gfpgan import GFPGANer
face_enhancer = GFPGANer(
model_path='https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth',
upscale=args.outscale,
arch='clean',
channel_multiplier=2,
bg_upsampler=upsampler)
修改为:
if args.face_enhance: # Use GFPGAN for face enhancement
from gfpgan import GFPGANer
face_enhancer = GFPGANer(
model_path='./gfpgan/weights/GFPGANv1.3.pth',
upscale=args.outscale,
arch='clean',
channel_multiplier=2,
bg_upsampler=upsampler)
3 Real-ESRGAN effect display
The processed images are stored in the results directory
[root@localhost Real-ESRGAN]# ll results/
总用量 7908
-rw-r--r-- 1 root root 3029489 7月 19 11:20 00003_out.png
-rw-r--r-- 1 root root 133649 7月 19 11:17 0014_out.jpg
-rw-r--r-- 1 root root 4928934 7月 19 11:22 children-alpha_out.png
3.1 General Image Enhancement
python inference_realesrgan.py -n RealESRGAN_x4plus -i inputs/0014.jpg --face_enhance
Original Image:
Enhanced image:
3.2 Animation Image Enhancement
python inference_realesrgan.py -n RealESRGAN_x4plus_anime_6B -i inputs/0014.jpg
Original Image:
Enhanced effect:
4 Summary
Real-ESRGAN is an image super-resolution enhancement method based on deep learning, which achieves high-quality image reconstruction through generative adversarial networks. It performs well in preserving details and enhancing image fidelity, and can be widely used in the field of image processing and enhancement. In the process of building AI digital human, Real-ESRGAN is mainly used for image enhancement after voice-driven face, and based on this enhancement technology to build high-definition digital human video.