AI-ISP: How does the mobile phone camera turn the landscape into a picture?

insert image description here

foreword

  This article only does one thing: AI-ISPHow does the mobile phone camera turn the scenery into a picture?

1. ISP chip

  A small SoCchip (10mm x 10mm)integrates more than 10 billion transistors (Kirin 9000有153亿), and its integrated modules jointly support the realization of mobile phone functions, such as those responsible for smooth switching of mobile phone applications CPU, supporting fast loading of game screens GPU, and dedicated to realizing AIcalculations and AIapplications. NPUAs well as taking pictures with mobile phones ISPand so on. According to the introduction in the book, a SoCpreview is roughly drawn:

insert image description here

  ISP(Image Signal Processor), which is the abbreviation of image signal processor, is mainly used to process front-end image signals. It can convert optical signals into electrical signals and is the core device of image processing. Therefore, its performance will directly affect the camera and video effects of the mobile phone.
  When the phone is turned on to take a picture, the lens (Lens)will first project the subject on the image sensor (Sensor). At the same time, the image processor (ISP)will calculate the appropriate parameters through light metering and distance measurement and instruct the lens to focus. With the press of the camera button, Sensoran exposure will be completed, and ISPit will be turned into a picture through advanced processing. The general process is as follows:

insert image description here
  So, this article focuses on ISPhow to Sensorconvert the output data into RGBimages.

2. Sensor knowledge

  Reference article: CMOS image sensor

  CMOS(Complementary Metal Oxide Semiconductor), that is, complementary metal oxide semiconductor, is a kind of semiconductor that can record light changes . It mainly uses semiconductors made of silicon and germanium to make it CMOScoexist with Ntype and Ptype semiconductors. The current generated by these two complementary effects can be The processed chip records and interprets the image (also introduced in the digital book). CMOSImage sensor (CMOS imager Sensor ), referred to CISas an image sensor using CMOStechnology, is an image sensing element manufactured using the principle of photoelectric technology.
  It is reasonable Sensorto output RGByes raw data, each pixel perceives RGB 3a component number, which is the most accurate. But this requires 3a set of photosensitive plates, and the set RGBof 3data also needs time synchronization and alignment, which is costly and difficult. Therefore, in practice, a filter plate called a Bayercolor filter array is usually used (Bayer Color Filter Array, CFA)and placed on a photosensitive plate. , that is, the color filter CFA(Color Filter Array)array, because it CIScannot perceive the wavelength of light itself, that is , CFAit cannot perceive CISthe color . For example the following pattern:BayerRGGBRGB BayerBayer PatternBGGRGBRGGRBGRGGBBGGR

insert image description here

  A pixel (Pixel)is the smallest photosensitive unit of an image sensor, and an array of pixels is arranged together to form the photosensitive area of ​​the image sensor.
  Pixel size (Pixel Size)refers to the size of a single photosensitive element of an image sensor. Generally, we can see two expressions, such as 1.12μmor 1.12μm×1.12μm. The larger the size of the pixel, the greater the number of photons received, and the greater the charge generated under the same lighting conditions and exposure time.

3. RAW data

  As a blogger, I exported the .jpgcorresponding .dngdata from my mobile phone:

insert image description here
  The left side is the original RAWdata (76.2MB), and the right side is the data (ISP)automatically processed by the mobile phone camera .   Let's use the library to read the data to see the detailed information:RGB(4.74MB)
RAW

import rawpy


raw_file = '../raws/IMG_20230418_143110.dng'
raw_data = rawpy.imread(raw_file)

# 查看有木有缩略图
try:
    thumb = raw_data.extract_thumb()
except rawpy.LibRawNoThumbnailError:
    print('no thumbnail found')

# 返回给定坐标相对于完整 RAW 大小的颜色索引
color_index = raw_data.raw_color(0, 0)
print(color_index)

# 返回相对于完整RAW图像的给定位置的RAW值
raw_value = raw_data.raw_value(0, 0)
print(raw_value)

# 返回相对于图像可见区域的给定位置的RAW值
raw_value = raw_data.raw_value_visible(0, 0)
print(raw_value)

insert image description here
  According to the two attributes of color_descand , the mode of this data is , and the reasoning is as follows:raw_patternRAWBayerBGGR

  color_descThe value is RGBG, the corresponding index is 0123, and raw_patternthe value is 2310, it is converted into a color value BGGR.

  To partially enlarge RAWthe image, the code and results are as follows:

import rawpy
import matplotlib.pyplot as plt
import seaborn as sns


raw_file = '../raws/IMG_20230418_143110.dng'
raw_data = rawpy.imread(raw_file)

# plt.imshow(raw_data.raw_image_visible)
sns.heatmap(raw_data.raw_image_visible[:10, :10], cmap=plt.cm.binary)
# plt.savefig('../raws/ori_raw_10x10.png')
plt.show()

insert image description here
  The following are rawpy.RawPyall properties of the object. For more details, please refer to the official documentation :

Attributes meaning
black_level_per_channel Black level correction per channel
camera_white_level_per_channel Per-channel saturation read from raw file metadata, if noneNone
camera_whitebalance white balance factor
color_desc A string description of the colors numbered from 0to . Note that the same letter may indicate different colors3(RGBG、RGBE、GMCY or GBTG)
color_matrix Color matrix, read from files for some cameras, computed for others. The color matrix shapeis[3, 4]
daylight_whitebalance Daylight white balance factor (daylight balance). read from a file, or computed from file data, or taken from a hard-coded constant
num_colors number of colors. Note that for RGBG for example this could be 3 or 4 depending on the camera model as some people use two different shades of green
raw_colors RAWAn array of color indices for each pixel in the image. is equivalent to calling for each pixelraw_color(y, x)
raw_colors_visible like raw_colorsbut nomargin
raw_image RAWimages, including margin, for Bayerimages, will be returned 2D ndarray. For Foveonand other RGBtypes of images, will return 3D ndarray. Note that there may 4be color channels, where 4the th channel can be0
raw_image_visible like raw_imagebut nomargin
raw_pattern Bayerpattern of the array
raw_type The type of returned RAWdata, such asRawType.Flat
rgb_xyz_matrix Camera RGB - XYZtransformation matrix. CMYGThis matrix is ​​constant (different for different models), the last line of RGB camera is zero, the last line of different color models (such as etc.) is non-zero, and the conversion matrix shapeis[4, 3]
sizes Information about the returned RAWimage and the post-processed (postprocessed)image, contained in rawpy.ImageSizesthe object
tone_curve Camera tone curve, read from files from Nikon, Sony and some other cameras. length is65536
white_level The level at which raw pixel values ​​are considered saturated

4. ISP Pipeline

  After reading the relevant reference materials, the situation here is very complicated, and there are too many contents. I simply sorted out the following as a blogger, for reference only:

insert image description here
  Here is an arrangement based on some blogs on the Internet:
  1. BLC(Black Level Correction)
  Black level correction, due to Sensorthe existence of leakage current, put the lens into a completely black environment, Sensorthe output original data is not 0, and we hope that the original data is when it is completely black 0.
  2. LSC(Lens Shade Correction)
  Lens shading correction, because as the field of view gradually increases, the oblique light beam that can pass through the camera lens will gradually decrease, resulting in a Senorcaptured image with high brightness in the middle and low brightness around the edges.
  3. BPC(Bad Point Correction)
  Dead point correction, also called Defect Pixel Correction(DPC), because Sensorit is a physical device, it is inevitable to have dead points; and the number of dead points will increase after a long time of use. By observing the output color dots and bright dots in a completely dark environment, or observing the output color dots and black dots under a white object, you can see the dead pixels scattered everywhere irregularly.
  4. Demosaic
  Demosaicing, linear interpolation is performed on the blank part, also called color interpolation, so that it has color (that is, each pixel has three colors).
  5. DR(Denoise)
  Noise removal, specifically, there are many, such as 2DNR, 3DNRetc., Senorthe photosensitive device contains an analog part, so the noise in the signal is difficult to avoid, and ADCthe device itself will also introduce noise. In addition, when the light is low, the entire system needs to amplify the signal, so the noise is also amplified.
  6. AWB(Automatic White Balance)
  Automatic white balance, the human visual system has certain color constancy characteristics, and will not be affected by the color of the light source. In real life, whether it is sunny, cloudy, indoor incandescent or fluorescent, the white objects people see are always white, which is the result of vision correction. The human brain has a certain prior knowledge of the color of the object, can identify the object and correct the color difference. But Sensorit does not have such characteristics, such as a piece of white paper, under different light,SensorThe output is a different color, yellowish at low color temperatures and bluish at high color temperatures. For example, photos taken under incandescent lighting tend to be yellowish; while outdoors with sufficient sunlight, the scene will also be blue. We need to Sensorconvert the output of white objects under different color temperature light conditions to be closer to white.
  7. CCM(Color Correction Matrix)
  Color correction, Sensorthe image acquired by the image sensor has a distance from the expected color and must be corrected. AWBWhite has been calibrated, CCMwhich is used to calibrate the accuracy of colors other than white.
  8. RGB Gamma
  GammaCorrection, the human eye, unlike a video camera, receives photons to perceive light. For example: every time a light is added in a small dark room, the camera can linearly increase the brightness. However, the human eye feels it clearly when adding a light in the dark, and the human eye will not feel it as the number of lights increases in the future. (Non-linear human eye)
  9. CSC(Color Space Conversion)
  color space conversion, YUVit is more convenient to perform color noise removal, edge enhancement, etc. in the color space, and YUVit saves bandwidth during storage and transmission. For example, RGBToYUV.
  10. HDR(High-Dynamic Range)
  High dynamic range, the medium light intensity in nature is very wide, but the human eye's ability to distinguish details in bright and extremely dark environments is relatively narrow, and the recording range of the camera is even narrower. The real technology is to record highlights in the visual range HDR. Detail resolution in dark environments. In order to ensure that the brightness range of the world seen by the human eye is almost the same as that of the image captured by the display or camera, or even better, it is necessary to tone mappingreproduce the dark and bright details by passing through. This is a purely visual process, not real HDR. In short, wide dynamic range technology can make particularly bright areas and particularly dark areas of the scene visible in the final image at the same time.
  11. Color denoise / Sharpness
  Sharpening is mainly for YUVnoise reduction, and in order to eliminate the loss of image details during the noise reduction process, it is necessary to sharpen the image and restore the relevant details of the image. Because of YUVthe color space, these processes are more convenient.
  12. AEC(Automatic Exposure Control)
  In automatic exposure, the intensity of light varies greatly in different scenes. The human eye has the ability of self-adaptation, so it can adjust quickly so that it can sense the appropriate brightness, but the image sensor does not have this ability of self-adaptation, so the automatic exposure function must be used to ensure that the photos taken are exposed accurately So as to have suitable brightness.

  Space is limited, so let's just look at Demosaichow it works. As mentioned above, to be more specific, it is to convert a colorless RAWimage into RGBa three-channel color image. The example picture is as follows:

insert image description here
  The code is implemented as follows:

import rawpy
import cv2
import numpy as np
import matplotlib.pyplot as plt


raw_file = '../raws/IMG_20230418_143110.dng'
raw_data = rawpy.imread(raw_file)
# rawpy内置VNG-Demosaic算法
rgb_data = raw_data.postprocess(demosaic_algorithm=rawpy.DemosaicAlgorithm(1))
# print(rgb_data.shape)
plt.imshow(rgb_data)
plt.savefig('../raws/isp_rgb.png')


with rawpy.imread(raw_file) as raw:
    bayer_img = raw.raw_image.copy()
bayer_img = np.uint8(bayer_img)		# uint16 --> uint8
# opencv内置VNG-Demosaic算法
rgb_img = cv2.demosaicing(bayer_img, cv2.COLOR_BAYER_BG2RGB_VNG)
# print(rgb_img.shape)
plt.imshow(rgb_data)
plt.savefig('../raws/isp_rgb_cv2.png')

  The effect is as follows:

insert image description here

insert image description here

  The reference blog is as follows:
  1. Introduction to the whole process of ISP: https://www.ngui.cc/article/show-954304.html
  2. Several Zhihu articles I have collected: https://www.zhihu.com/collection /865188946

5.AI-ISP

  AI ISPIt is a new technical concept that has only come out in recent years. When faced with increasingly high scene complexity and special image quality requirements, Tradition is facing increasingly large ISPparameter libraries, difficult debugging, and gradually lengthening development cycles. challenge. With AIthe help of technology, the function strengthened by machine learning method ISPhas become an important direction of technology development, thus it was born AI ISP.
  In the "Learning to see in the dark"2018 paper of Intel Labs , it was proposed that all functions can be realized through the entire neural network . In this paper, enter through a graph, and output or image. So far, I have not heard of any ISP product that is completed by the entire neural network.ISPRAWRGBYUV

insert image description here
  In Aixin Pipeline, only the important modules are optimized AI, and the limited computing power is concentrated AI ISPin the most critical functions that are most visible to the human eye, such as HDR, denoising 3DNR, tone mapping, and demosaicsuch functions in AIorder to achieve AI ISPthe best overall effect.

insert image description here
  In the whole of Aixin ISP, there is a dedicated ISPdesign NPU, which is not exactly the same NPUas in other traditional chips NPU. In addition to emphasizing the computing unit, there are also Pre-Process, Post-Processand a larger one Shared Memory, as well as some for AI ISPstream processing and CVThe processing operation can make its computing power AIbetter in image processing.

  Reference article: AI vision chip dry goods sharing: in-depth explanation of the technical principles of AI ISP ([https://zhuanlan.zhihu.com/p/467137601](AI vision chip dry goods sharing: in-depth explanation of the technical principles of AI ISP))

conclusion

  The content of the blog is constantly being updated and supplemented during continuous charging. If there are any mistakes, please feel free to poke the editor privately! Let’s make progress together, thank you Thanks♪(・ω・)ノ

Guess you like

Origin blog.csdn.net/qq_42730750/article/details/130224132