Article directory
foreword
This article only does one thing: AI-ISP
How does the mobile phone camera turn the scenery into a picture?
1. ISP chip
A small SoC
chip (10mm x 10mm)
integrates more than 10 billion transistors (Kirin 9000有153亿)
, and its integrated modules jointly support the realization of mobile phone functions, such as those responsible for smooth switching of mobile phone applications CPU
, supporting fast loading of game screens GPU
, and dedicated to realizing AI
calculations and AI
applications. NPU
As well as taking pictures with mobile phones ISP
and so on. According to the introduction in the book, a SoC
preview is roughly drawn:
ISP(Image Signal Processor)
, which is the abbreviation of image signal processor, is mainly used to process front-end image signals. It can convert optical signals into electrical signals and is the core device of image processing. Therefore, its performance will directly affect the camera and video effects of the mobile phone.
When the phone is turned on to take a picture, the lens (Lens)
will first project the subject on the image sensor (Sensor)
. At the same time, the image processor (ISP)
will calculate the appropriate parameters through light metering and distance measurement and instruct the lens to focus. With the press of the camera button, Sensor
an exposure will be completed, and ISP
it will be turned into a picture through advanced processing. The general process is as follows:
So, this article focuses on ISP
how to Sensor
convert the output data into RGB
images.
2. Sensor knowledge
Reference article: CMOS image sensor
CMOS(Complementary Metal Oxide Semiconductor)
, that is, complementary metal oxide semiconductor, is a kind of semiconductor that can record light changes . It mainly uses semiconductors made of silicon and germanium to make it CMOS
coexist with N
type and P
type semiconductors. The current generated by these two complementary effects can be The processed chip records and interprets the image (also introduced in the digital book). CMOS
Image sensor (CMOS imager Sensor )
, referred to CIS
as an image sensor using CMOS
technology, is an image sensing element manufactured using the principle of photoelectric technology.
It is reasonable Sensor
to output RGB
yes raw data
, each pixel perceives RGB
3
a component number, which is the most accurate. But this requires 3
a set of photosensitive plates, and the set RGB
of 3
data also needs time synchronization and alignment, which is costly and difficult. Therefore, in practice, a filter plate called a Bayer
color filter array is usually used (Bayer Color Filter Array, CFA)
and placed on a photosensitive plate. , that is, the color filter CFA(Color Filter Array)
array, because it CIS
cannot perceive the wavelength of light itself, that is , CFA
it cannot perceive CIS
the color . For example the following pattern:Bayer
RGGB
RGB Bayer
Bayer Pattern
BGGR
GBRG
GRBG
RGGB
BGGR
A pixel
(Pixel)
is the smallest photosensitive unit of an image sensor, and an array of pixels is arranged together to form the photosensitive area of the image sensor.
Pixel size(Pixel Size)
refers to the size of a single photosensitive element of an image sensor. Generally, we can see two expressions, such as1.12μm
or1.12μm×1.12μm
. The larger the size of the pixel, the greater the number of photons received, and the greater the charge generated under the same lighting conditions and exposure time.
3. RAW data
As a blogger, I exported the .jpg
corresponding .dng
data from my mobile phone:
The left side is the original RAW
data (76.2MB)
, and the right side is the data (ISP)
automatically processed by the mobile phone camera . Let's use the library to read the data to see the detailed information:RGB
(4.74MB)
RAW
import rawpy
raw_file = '../raws/IMG_20230418_143110.dng'
raw_data = rawpy.imread(raw_file)
# 查看有木有缩略图
try:
thumb = raw_data.extract_thumb()
except rawpy.LibRawNoThumbnailError:
print('no thumbnail found')
# 返回给定坐标相对于完整 RAW 大小的颜色索引
color_index = raw_data.raw_color(0, 0)
print(color_index)
# 返回相对于完整RAW图像的给定位置的RAW值
raw_value = raw_data.raw_value(0, 0)
print(raw_value)
# 返回相对于图像可见区域的给定位置的RAW值
raw_value = raw_data.raw_value_visible(0, 0)
print(raw_value)
According to the two attributes of color_desc
and , the mode of this data is , and the reasoning is as follows:raw_pattern
RAW
Bayer
BGGR
color_desc
The value isRGBG
, the corresponding index is0123
, andraw_pattern
the value is2310
, it is converted into a color valueBGGR
.
To partially enlarge RAW
the image, the code and results are as follows:
import rawpy
import matplotlib.pyplot as plt
import seaborn as sns
raw_file = '../raws/IMG_20230418_143110.dng'
raw_data = rawpy.imread(raw_file)
# plt.imshow(raw_data.raw_image_visible)
sns.heatmap(raw_data.raw_image_visible[:10, :10], cmap=plt.cm.binary)
# plt.savefig('../raws/ori_raw_10x10.png')
plt.show()
The following are rawpy.RawPy
all properties of the object. For more details, please refer to the official documentation :
Attributes | meaning |
---|---|
black_level_per_channel |
Black level correction per channel |
camera_white_level_per_channel |
Per-channel saturation read from raw file metadata, if noneNone |
camera_whitebalance |
white balance factor |
color_desc |
A string description of the colors numbered from 0 to . Note that the same letter may indicate different colors3 (RGBG、RGBE、GMCY or GBTG) |
color_matrix |
Color matrix, read from files for some cameras, computed for others. The color matrix shape is[3, 4] |
daylight_whitebalance |
Daylight white balance factor (daylight balance) . read from a file, or computed from file data, or taken from a hard-coded constant |
num_colors |
number of colors. Note that for RGBG for example this could be 3 or 4 depending on the camera model as some people use two different shades of green |
raw_colors |
RAW An array of color indices for each pixel in the image. is equivalent to calling for each pixelraw_color(y, x) |
raw_colors_visible |
like raw_colors but nomargin |
raw_image |
RAW images, including margin , for Bayer images, will be returned 2D ndarray . For Foveon and other RGB types of images, will return 3D ndarray . Note that there may 4 be color channels, where 4 the th channel can be0 |
raw_image_visible |
like raw_image but nomargin |
raw_pattern |
Bayer pattern of the array |
raw_type |
The type of returned RAW data, such asRawType.Flat |
rgb_xyz_matrix |
Camera RGB - XYZ transformation matrix. CMYG This matrix is constant (different for different models), the last line of RGB camera is zero, the last line of different color models (such as etc.) is non-zero, and the conversion matrix shape is[4, 3] |
sizes |
Information about the returned RAW image and the post-processed (postprocessed) image, contained in rawpy.ImageSizes the object |
tone_curve |
Camera tone curve, read from files from Nikon, Sony and some other cameras. length is65536 |
white_level |
The level at which raw pixel values are considered saturated |
4. ISP Pipeline
After reading the relevant reference materials, the situation here is very complicated, and there are too many contents. I simply sorted out the following as a blogger, for reference only:
Here is an arrangement based on some blogs on the Internet:
1.
BLC(Black Level Correction)
Black level correction, due to Sensor
the existence of leakage current, put the lens into a completely black environment, Sensor
the output original data is not 0
, and we hope that the original data is when it is completely black 0
.
2.
LSC(Lens Shade Correction)
Lens shading correction, because as the field of view gradually increases, the oblique light beam that can pass through the camera lens will gradually decrease, resulting in a Senor
captured image with high brightness in the middle and low brightness around the edges.
3.
BPC(Bad Point Correction)
Dead point correction, also called Defect Pixel Correction(DPC)
, because Sensor
it is a physical device, it is inevitable to have dead points; and the number of dead points will increase after a long time of use. By observing the output color dots and bright dots in a completely dark environment, or observing the output color dots and black dots under a white object, you can see the dead pixels scattered everywhere irregularly.
4.
Demosaic
Demosaicing, linear interpolation is performed on the blank part, also called color interpolation, so that it has color (that is, each pixel has three colors).
5.
DR(Denoise)
Noise removal, specifically, there are many, such as 2DNR
, 3DNR
etc., Senor
the photosensitive device contains an analog part, so the noise in the signal is difficult to avoid, and ADC
the device itself will also introduce noise. In addition, when the light is low, the entire system needs to amplify the signal, so the noise is also amplified.
6.
AWB(Automatic White Balance)
Automatic white balance, the human visual system has certain color constancy characteristics, and will not be affected by the color of the light source. In real life, whether it is sunny, cloudy, indoor incandescent or fluorescent, the white objects people see are always white, which is the result of vision correction. The human brain has a certain prior knowledge of the color of the object, can identify the object and correct the color difference. But Sensor
it does not have such characteristics, such as a piece of white paper, under different light,Sensor
The output is a different color, yellowish at low color temperatures and bluish at high color temperatures. For example, photos taken under incandescent lighting tend to be yellowish; while outdoors with sufficient sunlight, the scene will also be blue. We need to Sensor
convert the output of white objects under different color temperature light conditions to be closer to white.
7.
CCM(Color Correction Matrix
)
Color correction, Sensor
the image acquired by the image sensor has a distance from the expected color and must be corrected. AWB
White has been calibrated, CCM
which is used to calibrate the accuracy of colors other than white.
8.
RGB Gamma
Gamma
Correction, the human eye, unlike a video camera, receives photons to perceive light. For example: every time a light is added in a small dark room, the camera can linearly increase the brightness. However, the human eye feels it clearly when adding a light in the dark, and the human eye will not feel it as the number of lights increases in the future. (Non-linear human eye)
9.
CSC(Color Space Conversion)
color space conversion, YUV
it is more convenient to perform color noise removal, edge enhancement, etc. in the color space, and YUV
it saves bandwidth during storage and transmission. For example, RGBToYUV
.
10.
HDR(High-Dynamic Range)
High dynamic range, the medium light intensity in nature is very wide, but the human eye's ability to distinguish details in bright and extremely dark environments is relatively narrow, and the recording range of the camera is even narrower. The real technology is to record highlights in the visual range HDR
. Detail resolution in dark environments. In order to ensure that the brightness range of the world seen by the human eye is almost the same as that of the image captured by the display or camera, or even better, it is necessary to tone mapping
reproduce the dark and bright details by passing through. This is a purely visual process, not real HDR
. In short, wide dynamic range technology can make particularly bright areas and particularly dark areas of the scene visible in the final image at the same time.
11.
Color denoise / Sharpness
Sharpening is mainly for YUV
noise reduction, and in order to eliminate the loss of image details during the noise reduction process, it is necessary to sharpen the image and restore the relevant details of the image. Because of YUV
the color space, these processes are more convenient.
12.
AEC(Automatic Exposure Control)
In automatic exposure, the intensity of light varies greatly in different scenes. The human eye has the ability of self-adaptation, so it can adjust quickly so that it can sense the appropriate brightness, but the image sensor does not have this ability of self-adaptation, so the automatic exposure function must be used to ensure that the photos taken are exposed accurately So as to have suitable brightness.
Space is limited, so let's just look at Demosaic
how it works. As mentioned above, to be more specific, it is to convert a colorless RAW
image into RGB
a three-channel color image. The example picture is as follows:
The code is implemented as follows:
import rawpy
import cv2
import numpy as np
import matplotlib.pyplot as plt
raw_file = '../raws/IMG_20230418_143110.dng'
raw_data = rawpy.imread(raw_file)
# rawpy内置VNG-Demosaic算法
rgb_data = raw_data.postprocess(demosaic_algorithm=rawpy.DemosaicAlgorithm(1))
# print(rgb_data.shape)
plt.imshow(rgb_data)
plt.savefig('../raws/isp_rgb.png')
with rawpy.imread(raw_file) as raw:
bayer_img = raw.raw_image.copy()
bayer_img = np.uint8(bayer_img) # uint16 --> uint8
# opencv内置VNG-Demosaic算法
rgb_img = cv2.demosaicing(bayer_img, cv2.COLOR_BAYER_BG2RGB_VNG)
# print(rgb_img.shape)
plt.imshow(rgb_data)
plt.savefig('../raws/isp_rgb_cv2.png')
The effect is as follows:
The reference blog is as follows:
1. Introduction to the whole process of ISP: https://www.ngui.cc/article/show-954304.html
2. Several Zhihu articles I have collected: https://www.zhihu.com/collection /865188946
5.AI-ISP
AI ISP
It is a new technical concept that has only come out in recent years. When faced with increasingly high scene complexity and special image quality requirements, Tradition is facing increasingly large ISP
parameter libraries, difficult debugging, and gradually lengthening development cycles. challenge. With AI
the help of technology, the function strengthened by machine learning method ISP
has become an important direction of technology development, thus it was born AI ISP
.
In the "Learning to see in the dark"2018
paper of Intel Labs , it was proposed that all functions can be realized through the entire neural network . In this paper, enter through a graph, and output or image. So far, I have not heard of any ISP product that is completed by the entire neural network.ISP
RAW
RGB
YUV
In Aixin Pipeline
, only the important modules are optimized AI
, and the limited computing power is concentrated AI ISP
in the most critical functions that are most visible to the human eye, such as HDR
, denoising 3DNR
, tone mapping
, and demosaic
such functions in AI
order to achieve AI ISP
the best overall effect.
In the whole of Aixin ISP
, there is a dedicated ISP
design NPU
, which is not exactly the same NPU
as in other traditional chips NPU
. In addition to emphasizing the computing unit, there are also Pre-Process
, Post-Process
and a larger one Shared Memory
, as well as some for AI ISP
stream processing and CV
The processing operation can make its computing power AI
better in image processing.
Reference article: AI vision chip dry goods sharing: in-depth explanation of the technical principles of AI ISP ([https://zhuanlan.zhihu.com/p/467137601](AI vision chip dry goods sharing: in-depth explanation of the technical principles of AI ISP))
conclusion
The content of the blog is constantly being updated and supplemented during continuous charging. If there are any mistakes, please feel free to poke the editor privately! Let’s make progress together, thank you Thanks♪(・ω・)ノ