Understand ISP pipeline in one article

I. Overview

1. What is an ISP?

Mainstream CMOS and CCD sensors basically output RAW data in Bayer Mosaic format, which needs to be converted to RGB or YUV format to be supported by mainstream image processing software. For Camera, it is generally necessary to further convert it to JPEG format for convenient storage. Generally, preview in YUV format and take photos in JPEG format. ISP (Image Signal Processing), the broad ISP includes JPEG and H.264/H.265 image compression processing, the narrow ISP only includes RAW format Convert to RGB or YUV.

2. ISP implementation solution

Camera type:

  • RAW camera: without isp function
  • YUV camera: with isp function

ISP is generally a hardware implementation solution, that is, as an independent chip or integrated into the camera. Due to the processing of a large number of images involved, independent chips have better real-time performance. The ISP module integrated into the camera generally only supports basic ISP functions.

ISP's Firmware contains three parts, one part is the ISP control unit and basic algorithm library, one part is the AE/AWB/AF algorithm library, and the other part is the sensor library. The basic idea of ​​Firmware design is to provide the 3A algorithm library separately, and the ISP control unit schedules the basic algorithm library and the 3A algorithm library. At the same time, the sensor library registers function callbacks with the ISP basic algorithm library and the 3A algorithm library respectively to achieve differentiated sensor adaptation. .

2.ISP Pipeline

A typical pipeline is as follows, driven by a clock of several hundred MHz:
Typical isp pipeline
Insert image description here

Insert image description here
Insert image description here
Insert image description here
For more complete terminology, please visit my blog:
Link: Camera HAL/ISP professional terminology collection

Two color space conversions:

  • Demosaic: RAW to RGB
  • CSC: RGB to YUV

3. ISP input

The input of ISP is basically RAW data in Bayer mosaic format. Bayer format pictures were invented by Eastman Kodak Company scientist Bryce Bayer (1929 – 2012). Bayer arrays are widely used in the field of digital image processing.

Commonly used Bayer formats include GBRG, GRBG, RGGB, etc., and common precisions include 8/10/12/14bit, etc.
Insert image description here
In addition, many cameras and camcorder products will insert an infrared cut-off filter into the lens optical path. Its function is to allow light with a wavelength smaller than the cut-off frequency to enter the system and prevent light with a wavelength larger than the cut-off frequency. . Commonly used cutoff frequencies are 630nm and 650nm, which allow varying degrees of red light to enter the imaging system, thus affecting the white balance and color correction algorithms. Therefore, the parameter configuration of the ISP must be configured with the actual selected cutoff frequency.

Data access method:

  • Online mode,online mode, the real-time data and timing control signals generated by the sensor are sent to the ISP for processing in behavioral units
  • Offline mode,offline mode, the image to be processed is stored in the system memory in units of frames, and is passed by a control logic when it needs to be processed DMA reads data from memory, adds timing control signals that simulate sensor behavior, and then sends them to the ISP for processing.

Advantages of online mode: low latency, data travels through dedicated hardware channels without going through the memory bus, saving memory bandwidth.

The first pixel data of a frame of image in online mode immediately enters the ISP pipeline to start processing after flowing out of the sensor, while in offline mode , ISP usually needs to wait until the last pixel data of a frame of image is available before starting processing.

To give a typical example: Assume a 1080p sensor, the main frequency (master clock frequency) is 76MHz, that is, one clock cycle is 13.15ns. Assuming that the sensor is configured with 2000 cycles per line, which is a typical configuration, including 1920 valid outputs and 80 horizontal blanking, the sensor will consume 13.15ns*2000=26.3us for each line of data output, and It takes 26.3us*1080=28.4ms to output all 1080 rows of effective pixels. And how long does it take a typical ISP to process this 1080p image? Only takes 3ms! In other words, the ISP can only prevent the read pointer from crossing the write pointer if it starts processing after 26ms.

Some HiSilicon series ISP chips support a special "low-latency" offline mode, that is, the ISP is started for processing after the Xth line of data of a frame of image is available, without having to wait until the last pixel is available. A hardware mechanism is added to ensure that the read pointer will not cross the write pointer, so that the user can choose to start ISP processing earlier. When the ISP read pointer catches up with the sensor write pointer, the hardware automatically inserts a delay period to make the ISP idle and wait. to ensure data integrity.

1. Line buffering

Whether in online mode or offline mode, ISP processes images in line units, so the ISP module will design a line buffer to cache several lines of images. Usually the size of the line buffer determines the maximum resolution supported by the ISP. For example, if an ISP's line buffer can accommodate 2048 pixels per line, it cannot support resolution specifications exceeding 2k/1080p.

2. Data alignment

ISP, CODEC and other hardware units usually have granularity requirements when processing images, that is, 8/16/32/64/128 pixels must be processed as a group, so that they can be processed through a>) requirement. Most sensors support a linesize attribute to ensure that the width of each line of data output by the sensor conforms to ISP alignment requirements. Terms equivalent to linesize include stride and pitch, both of which represent the actual storage space occupied by each row of data. When the linesize is not equal to the actual resolution of the image, the sensor will fill the alignment part with a data, which can be a fixed value of 0 (zero-padding) or the value of the last pixel in this row (copy-padding). alignment to improve throughput. This requirement is called ISP's data alignment (Hardware parallelization

4. Key algorithms that are difficult to understand

Some algorithms in ISP Pipeline are well-known by name and will not be explained too much. However, there are some algorithms that require special instructions, which non-professionals generally do not understand. Once you learn it, you become an expert.

1. WDR

Wide dynamic range, also called HDR.
Due to material and process limitations, ordinary sensors can generally provide a dynamic range of 50~70dB, and long and short exposure image fusion (frame stitching or WDR fusion) is generally performed by the ISP.

If you want the camera output frame rate to remain unchanged at 30fps, two-frame fusion WDR requires the sensor to output 60fps, three-frame fusion WDR requires the sensor to output 90fps, and four-frame fusion WDR requires the sensor to output 120fps. This is also the reason why mainstream products mainly use two-frame synthesis WDR.

In order to avoid motion artifacts, Staggered WDR technology is used, that is, the sensor no longer outputs in frame units, but in row units, which can alleviate the problem of motion artifacts. As shown in the figure below, the sensor first outputs a row of long-exposure pixels, then outputs a row of short-exposure pixels, and then starts outputting the next row.
Insert image description here
SONY’s way of supporting WDR is called DOL (Digital OverLap) technology, which supports up to three frames of exposure. The difference from Staggered WDR technology is that its output format uses some special flag data, so it requires dedicated logic circuits for analysis. It supports two pixel output methods. Method 1 is to use one stream for output, and each line is alternated by long exposure + short exposure lines. Method 2 is to use two streams for parallel output, as shown in the figure below.
Insert image description here
Insert image description here

2. Image compression

In order to reduce the pressure on transmission bandwidth and storage, chips that support 4K or above will design a compression algorithm on DMA. When DMA writes data to memory, the compressed data actually enters the memory. When DMA reads data from memory, the user gets decompressed data.

The image compression technology sold by Arm is called AFBC, which stands for Arm Frame Buffer Compression. This is a lossless compression technology based on Pulse Code Modulation (PCM) technology. Under typical circumstances, it can achieve a compression rate of about 50%. Save storage space and transmission bandwidth. Intelligence shows that Huawei has implemented lossy compression technology based on wavelet transform in its mobile phone chips, and the compression efficiency should be higher.

3. Tone Mapping Tone Mapping

After the WDR module completes multi-frame synthesis (frame stitch), the data bit width needs to be compressed to save computing resources in subsequent steps. A more reasonable approach is to adopt a step-by-step compression strategy. For example, the WDR module first compresses to 12-bit precision, and then further compresses to 10-bit precision after color processing such as CCM and Gamma. After passing through the CSC module, the final compression is performed to obtain the final 8-bit Accuracy output.

The process of compressing from 16/20-bit precision to 12-bit precision is called tone mapping. The main task of this step is to compress the dynamic range of the image, map the HDR image to the LDR image, and try to ensure that the image details are not lost.

It is divided into global algorithm (Global Tone Mapping, GTM) and local algorithm (Local Tone Mapping, LTM).

4. RAW domain processing

4.1 Lens shading correction LSC

There are two forms of lens shadow, namely

  • Luma shading, also known as vignetting, refers to the phenomenon that the brightness of the edges of the picture becomes darker due to the gradual attenuation of the lens light from the center to the edges.
  • Chroma shading refers to the phenomenon of false color in the image due to the separation of the focal plane position caused by the different refractive index of the lens to light of different wavelengths.
    Insert image description here
    Insert image description here

4.2 Noise reduction: Spatial Filter Spatial Filter

The main method of image denoising is to filter the image using a spatial filter. Filtering operations are usually performed on a filter window centered on a certain pixel. The size of the filter window is related to the specific algorithm. Commonly used sizes include 3x3, 5x5, 7x7 and other sizes. The filtering operation is called convolution in mathematics. It requires the use of a convolution kernel consistent with the filter window size. Each element of the convolution kernel represents a weight, which is multiplied by the image pixel value at the corresponding position, and then all the multiplications are accumulated together. It is the result after filtering.

4.3 Bayer Demosaic

Any image point (photosite) in the RAW domain only contains one real sample value, and the other two values ​​​​that make up the pixel (R, G, B) need to be predicted from the surrounding image points.

The last step of processing in the RAW domain is Demosaic, which converts pixels from the RAW domain to the RGB domain for the next stage of processing. The main challenge of the Demosaic algorithm is to maximize the accuracy of the algorithm and reduce image edge loss and color errors.
Demosaic zipper effect, edge blur, false color
If Demosaic is not handled well, there will be zipper effect, blurred edges, false colors and other phenomena.

To be continued…

Reference excellent blogs:
Link: Understanding ISP Pipeline
Link: ISP Pipeline for ISP algorithm learning
Link: ISP algorithm introduction

Guess you like

Origin blog.csdn.net/weixin_36389889/article/details/132467988