ICCV2021 SDR2HDR paper notes: A New Journey from SDRTV to HDRTV

code: https://github.com/chxy95/HDRTVNet

This article is the reading notes of the ICCV2021 article "A New Journey from SDRTV to HDRTV". I personally think that this article is an important article in the field of sdr2hdr. It completes the definition of the video sdr2hdr problem, analyzes the characteristics of the problem, and proposes a method , released a data set HDRTV1K, proposed 5 evaluation indicators, and then recorded the main points of the article.

1. introduction

  • Why is the sdr2hdr algorithm needed?

    (1) Video content is developing from standard definition and high definition to ultra high definition, and high dynamic range is an important feature of ultra high definition content

    (2) The content presented by high dynamic range video is closer to the perception of human eyes in natural scenes

    (3) With the increasing popularity of devices supporting high dynamic range, most video content is still in hdr format

  • Why is it so important, but there is so little research on it?

    The author believes there are two reasons:

    (1) HDR standards such as hdr10 and hlg have only recently been defined;

    (2) Lack of large-scale data sets for training and testing;

  • Analyzed the relationship between sdr2hdr and related topics

    (1) sdr2hdr is a highly pathological problem, they have different dynamic range, color gamut and bit depth

    In actual production , contents of SDRTV and HDRTV are derived from the same Raw file but are processed under different standards. Thus, they have different dynamic ranges, color gamuts and bit-depths. The same Raw file , but processed to a different standard. As such, they have different dynamic ranges, color gamuts, and bit depths.

    (2) To some extent, it is somewhat similar to image-to-image translation such as Pixel2Pixel [11] and CycleGAN

    (3) ldr2hdr is similar to sdr2hdr in name, but it is not a thing. ldr2hdr aims to predict the brightness of HDR scenes in the linear domain, which is essentially closer to raw file

    On the contrary, the task of LDR-to-HDR, which is similar in terms of name, has completely different connotations. LDR-to-HDR methods [21, 26, 10, 24, 5] aim to predict the HDR scene luminance in the linear domain, which is closer to Raw file in essence

    (4) Simultaneously perform super-resolution and sdr2hdr work

     Deep SR-ITM and JSI-GAN
    
  • Introduction to evaluation metrics

PSNR: mapping accuracy
SSIM, SR-SIM [36]: structural similarity
∆E_ITP [17]: color difference
HDR-VDP3: visual quality
  • contribute

    (1) Modeling and analysis of the sdr2hdr problem

    (2) We propose a three-step SDRTV-to-HDRTV solution method, which performs best in quantitative and qualitative comparisons

    (3) We propose a global color mapping network with only 35k parameters and the best effect

    (4) A dataset is proposed and five performance metrics are selected for this task

2. Preface

(1) sdr format definition

ITU-R. Parameter values for the hdtv standards for production and international programme exchange. Technical re- port, ITU-R Rec, BT.709-6, 2015. 2

ITU-R. Reference electro-optical transfer function for flat panel displays used in hdtv studio production. Technical report, ITU-R Rec, BT.1886, 2011. 2

(2) hdr format definition

wide color gamut

ITU-R. Parameter values for ultra-high definition televi- sion systems for production and international programme ex- change. Technical report, ITU-R Rec, BT.2020-2, 2015. 2

PQ or HLG OETF

ITU-R.Image parameter values for high dynamic range television for use in production and international programme exchange. Technical report, ITU-R Rec, BT.2100-2, 2018. 2, 3

Difference from ldr2hdr problem

Ldr2hdr usually refers to the concept in photography, and completes the mapping of the linear light domain; sdr2hdr refers to the mapping between the sdr video and the corresponding video pixel value that conforms to the hdr standard

3. Analysis

Next, the author made a series of more logical analysis, sorting out the pipeline of sdr/hdr video production -> modeling sdr2hdr task -> putting forward his own insights on the characteristics of sdr2hdr task -> abstracting his own deep learning based on these insights Methods

(1)SDRTV/HDRTV Formation Pipeline

insert image description here

Four operations are considered: tone mapping, gamut mapping, opto-electronic transfer function and quantization
There are still many operations not considered here: denoising and white balance in camera pipeline, color grading in HDR content production

  • tone mapping

Tone mapping is used to transform the high dynamic range signals to low dynamic range signals for adapting different display devices.

Tone mapping refers to converting high dynamic range signals into low dynamic range signals in order to adapt to display devices with different display capabilities

Tone mapping can be divided into global tone mapping (global tone mapping) and local tone mapping (local tone mapping). Global tone mapping uses the same function for mapping on all pixels. The parameters of the function are usually related to global statistics (such as average brightness ). Local tone mapping can usually be dynamically adjusted for local content, but it is usually computationally intensive and easy to introduce artifacts;

It is noteworthy that S-shape curves are commonly used for global tone mapping and clipping operations often exist in actual process of tone mapping.

Usually global tone mapping is an S-shaped curve, and there is usually a clip operation.

  • gamut mapping

Gamut mapping is to convert colors from source gamut to target gamut while still preserving the overall look of the scene

Color gamut mapping refers to converting colors from the original color gamut to the target color gamut while maintaining the overall look and feel.

  • Opto-electronic transfer function (OETF)

Complete the conversion of linear optical signal to nonlinear electrical signal
sdr: gamma
hdr: pq, hlg

  • Quantization

insert image description here

(2) Modeling sdr2hdr problem:

insert image description here

(3) Based on the above model, the author puts forward some observations and insights:

(1) Many key operations in the pipeline are global operations, such as global tone mapping, color gamut mapping, oetf, and the reverse operations of these operations can also be approximated as global operations;

(2) Some operations rely on local spatial information, such as local tone mapping and inverse quantization, which can be completed through all operations;

(3) There is serious information compression/loss, for example, the highlight area may lose information due to clip operation during tonemapping;

(4) Based on these insights, the authors propose a three-part SDR2HDR scheme:

three-step solution pipeline including adaptive global color mapping, local enhancement and highlight generation.
The method is divided into three steps: dynamic global color mapping, local enhancement, and highlight area detail generation

(5)Comparison with Existing Solutions

insert image description here

  • End-to-end solution

The first type is an end-to-end solution, as shown in Figure b above, including some Image-to-Image translation methods and existing super-resolution and sdr2hdr joint methods, all of which directly complete the direct mapping from rgb to rgb through cnn. Since the class method does not consider the production mechanism of sdr and hdr at all, there are usually obvious local artifacts and unnatural colors.

  • LDR-to-HDR based solution

As shown in Figure c above, the LDR-to-HDR method is to convert in the linear luminance space. After the conversion is completed, the color gamut must be mapped to bt.2020, PQ/HLG OETF, and quantized, which is the hdr video we want.

4. Method

insert image description here

(1) Adaptive Global Color Mapping

The first step is to complete the global operations in sdr2hdr, such as global tone mapping, color gamut mapping, and oetf. In order to complete this part, the author proposes two networks:

  • Base network

The global operation acts on each pixel individually. According to the conclusion of CSRNet, a network with only 1x1 convolution and activation function can complete this operation; therefore, the basic network form in this block is:

Please add a picture description

CSRNet: Conditional sequential modulation for efficient global image retouching


Although the base network can only learn a one-to-one color mapping, it also achieves considerable perfor-mances, as shown in Tab.

It is noteworthy that the base network can perform like a 3D lookup table (3D LUT) with fewer parameters rather than learning 3D LUT directly, and please refer to supplementary material for more results.
这一部分可以实现成3D LUT

It performs like adding a multiplicative Bernoulli noise on features which has an effect similar to data augmentation.
Disrupting the order of pixels can still get good results, indicating that the operation of the network does not require local information.

  • AGCM : Base Network + Global Feature Modulation

    The schematic diagram of the network is as follows, which is equivalent to inserting the GFM module on the basis of the base network to linearly adjust the results of the 1x1 convolutional mapping, and the linearly adjusted parameters are obtained through model learning. In fact, it can be understood that the base network is a 3D lut, and AGCM has a 3d lut for each picture;

Please add a picture description
Please add a picture description

(2) Local Enhancement

Although the effect of AGCM is already very good, local enhancement is still indispensable. Here the author uses a resnet for local enhancement. In addition, the author also found a more important phenomenon:
if the local enhancement is performed before the global mapping, obvious artifacts are usually produced, which may be the reason why there are many artifacts in the previous method, and the author does not explain further here;
Please add a picture description

(3) Highlight Generation

This part is relatively clear from the algorithm block diagram, which is to use a classic gan network to complete the further detail generation of the highlight area; from the results of my actual measurement, it seems that the effect of this step is limited;

5. HDRTV1K dataset

22段hdr10视频
Sdr由youTube下变换得到
Training set:1235
Test set: 117  

The download link is on the code home page https://github.com/chxy95/HDRTVNet

6. Results

Please add a picture description
Please add a picture description

Guess you like

Origin blog.csdn.net/BigerBang/article/details/124043708