Camera engineers say Camera - Detailed explanation of data format YUV (2)

Camera data format YUV detailed explanation

Overview

In the previous section , we talked about several image color representation methods commonly used in Camera projects, focusing on the commonly used RAW, RGB, and RGBA formats. With the rapid development of the video industry, the YUV format has derived very complex YUV format definitions, such as YUV444, YUV422, YUV420, YUV420, YUV420SP, YUV422P and other formats. The author has been working in the industry for many years and has often been troubled by the definition of YUV style, so he made additional remarks to help everyone solve their doubts.

Welcome to the video world - first introduction to YUV format

The world of RGB is rich and colorful, but the cost is that the amount of data is too large. For example, in RGB888 format, one pixel corresponds to 24bits of data, so 1 million pixels requires 1 million * 24bits of data. On the other hand, sometimes the color information is not important and we just need the outline of the object. For example, traditional black and white TV, or grayscale images in some image processing:
Insert image description here
considering the need to reduce the amount of data, and sometimes only grayscale images are enough. Therefore, the relevant video protocol organization defines the YUV color format. in:

  • "Y" represents brightness (Luminance, Luma), which specifies how bright the pixel is perceived.
  • "U" and "V" are chrominance and concentration (Chrominance, Chroma), which are used to describe the color and saturation of the image and are used to specify the color of the corresponding pixel. The grayscale image above is the image when there is only Y pixel value. Without U and V, color information cannot be reflected. U+V can be collectively called Chroma.
    The corresponding Y, U, and V components can be separated from the original YUV image. The picture below comes from wiki encyclopedia. From top to bottom, it shows the original picture, and the three components of Y, Cb and Cr.
    Insert image description here

Data in YUV format and RGB format can be converted to each other

Insert image description here

Usually the YUV format we use is the ITU-R standard, also called YCbCr (this is not always the case, this needs to be confirmed with the specific product support personnel. The author believes that YCbCr is one of the YUV formats, and YUV just represents a color gamut space). Among them, Cb reflects the difference between the blue part of the RGB input signal and the signal brightness value Y; where Cr reflects the difference between the red part of the RGB input signal and the signal brightness value Y. There are many standards in the video industry (behind the competition between different display manufacturers). The formulas for converting RGB to YUV under different standards are different. The formula for converting RGB to YCbCr under the ITU-R BT.601 standard is:

Y : Y = 0.299 x R + 0.587 x G + 0.114 x B + 0
U : Cb = -0.169 x R - 0.331 x G + 0.499 x B + 128
V : Cr = 0.499 x R - 0.418 x G - 0.0813 x B + 128

What are the benefits of introducing YUV color space?

  • YUV is compatible with both black and white and color TVs. For black and white TV, it extracts Y brightness signal (ie Only Y) and displays black and white video. For color TV, it uses complete YUV to display color picture.
  • Scientific research shows that the human eye is less sensitive to UV than brightness, which can appropriately reduce the amount of UV data without affecting human senses. That's why there are YUV descriptions in multiple formats, such as YUV420, YUV422, YUV444, etc.
  • YUV format and RGB format can be converted to each other, but YUV format can have a smaller data size than RGB format, so it can save storage space and transmission bandwidth.
  • Some video processors can only input data in YUV format.

How YUV format reduces the data size of images

Similar to the RGB format, for color images, to know the actual color value of a pixel, you need to know the Y, U, and V values ​​of the pixel. Assuming that each value is represented by 8 bits, the overall data amount is corresponding to each pixel. 24bits, that is, 24bits/pixel. Suppose you want to transmit a 1 million pixel video. This amount of data is too large for the video transmission of HDTV and webcam.

When generating YUV data from RGB data, subsampling can be used to reduce the average amount of data corresponding to each pixel. Since the human eye is more sensitive to brightness information than color information , it can retain more Y values ​​that record brightness information and reduce U and V values ​​that record color information. This way of reducing the amount of U and V data is sub (local) sampling. According to the subsampling method, there are usually YUV444, YUV422, YUV411 and YUV420. To put it simply, the so-called subsampling is a method that only leaves part of the U and V data when generating data , and uses the color information of adjacent pixels to complete the required U and V data according to the agreed rules when using the data . Don’t worry, we’ll talk about it in detail next.

YUV subsampling naming rules

The naming rules for YUV[j][a][b] are:

  • j, the number of reference pixels in the first row, usually 4
  • a, the data generated by the j reference pixels in the first row contains several chrome (i.e. U+V, below expressed as UV) components
  • b. The data generated by the j reference pixels in the second row next to the reference pixels in the first row contains several chrome (i.e. U+V, expressed as UV below) components.

YUV444(8bit)

YUV444 means that when the 4 reference pixels in the first row generate YUV data, each of the four pixels in the row corresponds to 4 Y+4 UV components, and the 4 reference pixels in the second row generate YUV data. Data, each of the four pixels also corresponds to 4 Y+4 UV components. As shown below, there are eight pixels from 1 to 8, each of which retains independent Y, U, and V components when generating corresponding data.
Insert image description here

  • YUV444 (8bit) means that each Y, U, and V component value is composed of 8bits. At this time, each pixel corresponds to 24bits = 3*8 (in the picture, each colored circle represents an 8bit value).
  • YUV444 (10bit) means that each Y, U, and V component value is composed of 10bits. At this time, each pixel corresponds to 30bits = 3*10. Even though each value has 2bits more information than the 8bit format, because of this The format does not significantly improve image quality and is not commonly used. Our subsequent discussions are all based on this representation method of 8bit.
  • In the YUV444 format, each pixel generates corresponding Y, U, and V data. This format is mainly used inside video processing equipment and is rarely used externally to avoid degradation of picture quality during processing.
    When using the corresponding YUV444 data :
  • When the value of pixel 1 is used, the actual YUV value used is, Y1+U1+V1, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 2 is used, the actual value used is: Y2+U2+V2.

YUV422

YUV422 means that when the 4 reference pixels in the first row generate YUV data, the four pixels in this row correspond to 4 Y+2 UV components. At the same time, when the 4 reference pixels in the second row generate YUV data, The four pixels in the second row correspond to 4 Y+2 UV components. As shown below, there are eight pixels from 1 to 8. Each pixel retains its independent Y component when generating corresponding data , but it is discarded in the horizontal direction. Specifically, it means that in the horizontal direction, every 4 pixels share adjacent U and V components.
Insert image description here
When using the corresponding YUV422 data , pixels 1 and 2 share a set of UV components, and pixels 3 and 4 share a set of UV components:

  • When the value of pixel 1 is used, the actual YUV value used is, Y1+U1+V1, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 2 is used, the actual value used is: Y2+U1+V1.

This will undoubtedly lead to some errors in color compared to the above YUV444, but in most cases there should be no difference in the color of two adjacent pixels. At the same time, compared with the average data size generated by each pixel of YUV444, which is 24 bits, the average data size of each pixel of YUV422 is reduced to 16 bits. The smaller data is very beneficial for the transmission and storage of video signals.
Of course, some cameras may use the following method to generate YUV422 data. It does not matter where the UV is generated. What is important is the idea of ​​sharing the UV:
Insert image description here
When using the corresponding YUV422 data above :

  • When the value of pixel 1 is used, the actual YUV value used is, Y1+U1+V2, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 2 is used, the actual value used is: Y2+U1+V2.
  • When the value of pixel 3 is used, the actual value used is: Y3+U3+V4.
  • When the value of pixel 4 is used, the actual value used is: Y4+U3+V4.

Error analysis: Compare the similarities and differences in subsampling methods between the two YUV422 formats?
In the first method, the data corresponding to pixel 1 is Y1+U1+V1, the data corresponding to pixel 2 is Y2+U1+V1, and the error occurs on pixel 2. In the second method, the data corresponding to pixel 1 is Y1+U1+V2, and the data corresponding to pixel 2 is Y2+U1+V1, and the error occurs in both pixel 1 and pixel 2. Although they have the same amount of data, their errors are different. In addition, in terms of software and hardware implementation, the first method is actually more common, that is, pixel 1 generates a Y+UV component. Usually the UV component is generated on a pixel , and the U or V of a pixel is not recorded separately. quantity. In different standards and displays, the actual sampling method of YUV422 will be different. The first method is the default sampling method of the BT601 standard.

No matter how this data is recorded, the idea of ​​discarding the U and V values ​​of adjacent pixels in the same row when generating data, and restoring the color of the corresponding pixel by borrowing the U and V values ​​of adjacent pixels when using the data is consistent.

YUV411

YUV411 means that when the 4 reference pixels in the first row generate YUV data, the four pixels in this row correspond to 4 Y+1 UV components. At the same time, when the 4 reference pixels in the second row generate YUV data, The four pixels in the second row correspond to 4 Y+1 UV components. As shown below, there are eight pixels from 1 to 8. Each pixel retains its independent Y component when generating corresponding data , but it is discarded in the horizontal direction. Specifically, it means that in the horizontal direction, every 4 pixels share adjacent U and V components.

Insert image description here
When using the corresponding data , pixels 1, 2, 3, and 4 share a UV component, and pixels 5, 6, 7, and 8 share a UV component:

  • When the value of pixel 1 is used, the actual YUV value used is, Y1+U1+V1, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 2 is used, the actual value used is: Y2+U1+V1.
  • When the value of pixel 3 is used, the actual value used is: Y3+U1+V1.
  • When the value of pixel 4 is used, the actual value used is: Y4+U1+V1.

Similar to YUV422, YUV411 can also be sampled as follows when generating data:
Insert image description here
When using the corresponding data :

  • When the value of pixel 1 is used, the actual YUV value used is, Y1+U3+V1, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 2 is used, the actual value used is: Y2+U3+V1.
  • When the value of pixel 3 is used, the actual value used is: Y3+U3+V1.
  • When the value of pixel 4 is used, the actual value used is: Y4+U3+V1.

This subsampling mode of YUV411 will undoubtedly cause some errors in the generated data compared to the above-mentioned YUV422, but in most cases there should be no difference in the colors of several adjacent pixels. At the same time, compared with the average data size generated by each pixel of YUV422, which is 16 bits, the average data size of each pixel of YUV411 is reduced to 12 bits. The data is smaller, which is very beneficial for the transmission and storage of video signals.

YUV420

As TV screens and consumer electronics screens become larger and larger, the number of pixels in images increases dramatically. The average size of each pixel needs to be further reduced.
YUV420 means that when the 4 reference pixels in the first row generate YUV data, the four pixels in this row correspond to 4 Y+2 UV components. At the same time, when the 4 reference pixels in the second row generate YUV data, The four pixels in the second row correspond to 4 Y+0 UV components. As shown below, each of the eight pixels 1 to 8 retains an independent Y component when generating corresponding data , but it is discarded in the horizontal and vertical directions. Specifically, it means that in the horizontal direction, every two pixels share adjacent U and V components, and at the same time, in the vertical direction, the data of the UV component is generated alternately .
Insert image description here
When using the corresponding data , pixels 1, 2, 5, and 6 share a set of UV components, and pixels 3, 4, 7, and 8 share a set of UV components:

  • When the value of pixel 1 is used, the actual YUV value used is, Y1+U1+V1, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 2 is used, the actual value used is: Y2+U1+V1.
  • When the value of pixel 5 is used, the actual YUV value used is, Y5+U1+V1, and the subscript is the index of the corresponding pixel.
  • When the value of pixel 6 is used, the actual value used is: Y6+U1+V1.

This subsampling method will of course have errors, but in most cases there should be no difference in the colors of several adjacent pixels. At the same time, compared with the average data size generated by each pixel of YUV422, which is 16 bits, the average data size of each pixel of YUV420 is reduced to 12 bits. The data is smaller, which is very beneficial for the transmission and storage of video signals.

On the other hand, although YUV420 and YUV411 are both 12 bits in data size, their color loss in the horizontal direction is smaller because YUV420 generates two UV components in the horizontal direction, while YUV411 only generates one in each line in the horizontal direction. UV component. Today's display technology usually has a 16:9 large screen, that is, the aspect ratio of the screen is 16:9. This screen design, which is larger in the horizontal direction than in the vertical direction, promotes the development of the YUV420 format in the video field.

YUV440

After understanding the above example, I believe that when you see this picture, you should be able to know its name - YUV440:
Insert image description here

Thinking questions

  1. In terms of color loss between YUV444, YUV422 and YUV420, which one has the largest error in the worst case?
    Of course, it is YUV420 with the smallest amount of data.

  2. YUV422 (10bit) For data in this format, how many bits are average for each pixel?
    Welcome to learn and discuss with me.

  3. Is the following method of generating data YUV420?
    Insert image description here
    Yes, this definition of YUV only defines the ratio of Y and UV components, and does not define which pixel points generate U and V components. You only need to look at how many Y's there are in a variety to know how many pixels there are. After knowing how many pixels there are, you can know which sampling mode it is by understanding its relationship with the UV component.

  4. YUV422, YUV420, YUV411 all cause color loss. In the worst case, what might the image we see look like?

No matter which kind of subsampling, the Y component (also called Y channel, Y vector, Y channel) is always accurate. In the worst case, we can still see the outline of the object clearly, but the color of the object will be wrong. This is exactly the purpose of introducing the YUV format to ensure clear outlines of objects and reduce the data occupied by color information that the human eye is insensitive to.

  1. Why do you prefer YUV format to RGB in image compression and video projects?
    It is precisely because the YUV format can always maintain the outline of the object (that is, retain the Y vector), and appropriately reducing the UV component can bring about a smaller amount of data, so image compression and video transmission projects all prefer the YUV format.

  2. Similar to the RGB domain described in the previous section, YUV also has limit range and full range. How to choose?
    1) The range of each component of the limit range is: YUV Y∈[16,235] Cb∈[16-240] Cr∈[16-240]. Also called TV range, video range.
    The reason why limit range is often used in the video and audio field is because movies and TV series usually have special effects requirements. Limit range can more conveniently implement special effects such as hiding wire details. In the BT606 standard, some formats use I to identify this limit range, such as I420.
    2) The range of each component of full range is: 0-255. The PC graphics card outputs full range mode. Some formats identify this limit range with F, such as F420.

  3. What should we pay attention to when converting YUV to RGB?
    There are two issues to consider when converting YUV and RGB:

  • Convert Matrix, whether it is BT601 or BT709, or BT2020 in the 4K era. Different matrices have different color ranges and different numerical values;
  • Range, the display value range of TV and PC monitors in YUV are different. In the case of 8-bit bit depth, the range of TV is 16-235 (Y), 16-240 (UV), while the range of PC is 0-255 , RGB has no range, all are 0-255!
  1. How do RGB and YUV exist in a video transmission?
    The camera usually generates RGB data, which is converted into YUV data internally or externally. The YUV data is encoded into a specified video format such as H264 and transmitted to the TV. The TV recovers the YUV data and converts it into RGB data is projected onto the LCD screen.

  2. What does the YUV400 format mean?
    There is no UV component, it is actually Only Y, which can also be expressed as Only luma.

  3. Is JPEG suitable for YUV or RGB compression?
    YUV. YUV is a representation method proposed to maintain image outlines and reduce data information, while JPEG aims to reduce the amount of data. Therefore, JPEG background often uses YUV data. For example, JPEG 4:2:2 and JPEG 4:2:0 are obtained by compressing YUV422 and YUV420 respectively. Of course, YUV422 contains more color information, so under the same compression level, the color distortion of YUV422 is smaller.

Summarize

  1. The development of the video industry has proposed the YUV data format. The YUV format is easier to reduce the data size than RGB data and maintain image information.
  2. YUV format "Y" represents brightness (Luminance, Luma), which is used to specify the perceived brightness of the pixel. "U" and "V" are chrominance and concentration (Chrominance, Chroma), which are used to describe the color and saturation of the image and are used to specify the color of the corresponding pixel.
  3. YUV reduces the size of data through Subsampling, that is, subsampling. According to the subsampling method, it can be divided into: YUV444, YUV422, YUV411, YUV420, YUV440, etc.
    Next section: Camera engineer talks about Camera - data format YUV format storage
    (thank you for likes or favorites)

Guess you like

Origin blog.csdn.net/wangyx1234/article/details/133102196