YUV data and format

1. Introduction to YUV

1. YUV description

 YUV is a color encoding method, which is different from RGB (red-green-blue).

        <1>Y represents the brightness component, also called the grayscale value: if only Y is displayed, the picture will be a black and white photo

        <2> U (Cb) represents the chroma component: it is the blue part of the photo to remove the brightness

        <3> V (Cr) represents the chroma component: it is the red part of the photo to remove the brightness

2. YUV storage format

YUV includes compressed format (packed) and flat format (planar) two formats.

2.1. Compressed format (packed)

        The packed format stores Y, U, and V values ​​as a Macro Pixels array (YUV is mixed together), similar to the way RGB is stored. For the YUV4:4:4 format, it is suitable to use the compressed format, so there are UYVY, YUYV, etc.

2.2. Planar format (planar)

        Planar formats store the three components of Y, U, and V in different matrices, the U component must be behind the Y component, and the V component must be behind the U component. The planar format includes I420 (4:2:0), YV12, IYUV, etc.

Further reading:

wiki provided by VLC

Microsoft Video Rendering with 8-Bit YUV Formats

3. Scan line (scan line)

        The scan line is used to describe how the TV displays the picture, as explained in the wiki:

        The electrons emitted by the electron gun on the TV screen are deflected by a magnetic field and hit the screen to emit light. Therefore, each picture frame is drawn by the scanning lines of the electron gun. The scanning line of the electron gun moves sequentially from the upper left pixel point to the lower right pixel point, and ejects electrons for imaging.

insert image description here

Second, the commonly used YUV format

        To save bandwidth, most YUV formats use less than 24 bits per pixel on average. The main subsample formats are YCbCr 4:2:0, YCbCr 4:2:2, YCbCr 4:1:1 and YCbCr 4:4:4.

        Use three graphs to visually represent the acquisition method. The Y component of the pixel is represented by a black dot, and the UV component of the pixel is represented by a hollow circle.

image

The representation of YUV is called A:B:C representation:

  • 4:4:4 means complete sampling, each Y corresponds to a set of UV components
  • 4:2:2 means 2:1 horizontal sampling, vertical full sampling, every 2 Y share a set of UV components
  • 4:2:0 means 2:1 horizontal sampling, vertical 2:1 sampling, every 4 Y share a set of UV components
  • 4:1:1 means 4:1 horizontal sampling and vertical full sampling.

 

1.YUV4:4:4

The sampling method of YUV4:4:4 means that each sampling component will not reduce the sampling rate when scanning each pixel:

insert image description here

        The reason why four squares are used is because in the YUV format, when the UV component is the smallest, four pixels need to share a UV component pair. At the same time, the pixels that share a UV component pair have a close relationship with the UV component on the plane, so these four pixels will not be points on the same scanning line, but distributed on two scanning lines. .

        A macro pixel can accommodate up to four macro pixels, and in the notation of YUV4:X:X, the 4 also expresses this meaning.

        It can be seen from the figure that the sampling method of YUV4:4:4 is to perform full sampling of Y, U, and V components for each pixel.

        Regarding memory usage: Because each component of the YUV mode is stored in a byte (8bit). So YUV4:4:4 format needs 4*8 + 4*8 + 4*8 = 96 bits, therefore, each pixel depth is 24 bits.

2.YUV4:2:2

        The sampling method of YUV4:2:2 means that the Y component and UV component in the horizontal direction are sampled at 2:1, and the sampling rate is not reduced in the vertical direction. That's it:

insert image description here

        Two pixels in the horizontal direction form a macro pixel, and the two pixels share a pair of UV pixel components.

        For four pixels, YUV4:2:2 format requires 4*8 + 2*8 + 2*8 = 64 bits, each pixel depth is 16 bits.

3.YUV4:2:0

        There are currently two variants of YUV4:2:0, one for the MPEG-1 standard as shown below:

insert image description here

        Another commonly used term is the MPEG-2 standard. The 4:2:0 we often see is usually this. As shown below:

insert image description here

        For four pixels, the YUV4:2:0 format requires 4*8 + 8 + 8 = 48 bits, each with a pixel depth of 12 bits.

4. YUV storage format

        The storage format of YUV is divided into packet formats and planar formats.

In the packed format, the Y, U, and V components are stored in a single array, and the YUV components are sequentially interleaved. Pixels are organized into macropixel groups, the layout of which depends on the sampling format.

In the planar format, the Y, U and V components are stored in three different planes (arrays). The three YUV components are stored separately in three different arrays.

4:4:4, 24-bit pixel depth
YUV4:4:4 actually expresses: the data packed and stored in the sampling mode bit 4:4:4. It is stored as shown in the figure:

insert image description here

A small square represents a byte, and a group of consecutive small squares represents a pixel.

4:2:2, 16-bit pixel depth
4:2:2 sampling format has two storage methods

YUY2
UYVY
are all stored in a packed format, in which each macro pixel is two pixels, encoded as four consecutive bytes.

YUY2
In YUY2 format, the first byte contains the first Y sample, the second byte contains the first U(Cb) sample, the third byte contains the second Y sample, and the fourth The byte contains the first V(Cr) sample as shown:

insert image description here

UYVY
This format is the same as YUY2, but the byte order is reversed - that is, the chroma and luma bytes are reversed, as shown in the figure:

insert image description here

4:2:0, 12-bit pixel depth
The 4:2:0 formats to be introduced below all adopt the planar storage mode, and there are four types in total:

For all 4:2:0 modes of IMC2
IMC4
YV12
NV12 , the number of samples of the chroma component is 1/4 of that of the luminance component in both horizontal and vertical directions.

IMC2
The storage method of IMC2 format is shown in the figure:

insert image description here

Each component is stored in one byte, and the planar storage format means that all Y components in the video frame are stored first. After the Y component is stored, the chrominance component is stored. In the IMC2 format, the storage relationship of the YUV three-component is: first store all the Y components, then store all the V components, and finally store the U component.

In order to facilitate processing and expression, usually three arrays are used to store three components in the code.

In addition, it needs to be mentioned that in the IMC2 format, the memory space step size for storing UV components is half of that for storing Y components. In addition, because the sample book of the chroma component is 1/4 of the Y component, even if the space occupied by the chroma component is half that of the luminance component, there will be some free memory.

IMC4

insert image description here


It is similar to the IMC2 format, except that the storage order of the U and V chrominance components is reversed.

YV12&I420

insert image description here


The storage method of the YV12 format has changed again. The memory stride for storing the chroma component is half that of the luminance component. First, the Y component data is stored in the form of an unsigned char array, followed by the V component, and finally the U component.

The storage methods of I420 and YV12 are similar. The difference is that after the Y component of I420, the U component is stored, and the V component is stored at the end. The storage order of the chroma components is replaced. In addition, I420 is also called YUV420P.

The terms YV12, I420, and YUV420p appear more frequently in multimedia development. Everyone may wish to remember

NV12

insert image description here
The NV12 format first stores the Y component plane as an array of unsigned char values ​​with an even number of rows. Immediately following the Y plane is an array of unsigned char values ​​containing packed U (Cb) and V (Cr) samples.

Here is a detailed explanation of the commonly used YUV420 types.

3.1) The difference between YUV420p and YUV420sp

        Because YUV420 is more commonly used, here we will focus on YUV420. There are two types of YUV420: YUV420p and YUV420sp.

The format of YUV420sp is as follows:

The YUV420p data format is as follows:

3.2) Specific classification and details of YUV420p and YUV420sp

YUV420p: Also called planer plane mode, Y, U, V are different planes, that is, there are three planes.

YUV420p is divided into: the difference between them is only the order in which UVs are stored.

I420: also known as YU12, Android mode. The storage order is to store Y first, then U, and finally V. YYYYUUUVVV

YV12: The storage order is to store Y first, then V, and finally U. YYYVVVUUU

YUV420sp: Also known as bi-planer or two-planer double plane, Y is a plane, and UV is stored crosswise on the same plane.

YUV420sp is divided into: the difference between them is only the order in which UVs are stored.

NV12: IOS only has this mode. The storage order is to store Y first, and then store UV alternately. YYYYUVUVUV

NV21: Android mode. The storage order is to store Y first, then U, and then store VU alternately. YYYYVUVUVU

The official documentation is as follows:

YV12

All of the Y samples appear first in memory as an array of unsigned char values. This array is followed immediately by all of the V (Cr) samples. The stride of the V plane is half the stride of the Y plane, and the V plane contains half as many lines as the Y plane. The V plane is followed immediately by all of the U (Cb) samples, with the same stride and number of lines as the V plane (Figure 12).

 All Y samples are first displayed in memory as an array of unsigned  char values. This array is followed by all V (Cr) samples. The span of the V plane is half the span of the Y plane, and the V plane contains half the rows that the Y plane contains. The V plane is followed by all U(Cb) samples with the same span and number of rows as the V plane (Figure 12).

The general meaning is: first store all Y, followed by V, the step size (that is, width) of V is half of the step size of Y, and the row height of V is half of Y. After V is stored, U is immediately followed by U. For all U, the step size and height are the same as V, that is, they are half of Y.

Figure 12:

NV12

All of the Y samples are found first in memory as an array of unsigned char values with an even number of lines. The Y plane is followed immediately by an array of unsigned char values that contains packed U (Cb) and V (Cr) samples, as shown in Figure 13. When the combined U-V array is addressed as an array of little-endian WORD values, the LSBs contain the U values, and the MSBs contain the V values. NV12 is the preferred 4:2:0 pixel format for DirectX VA. It is expected to be an intermediate-term requirement for DirectX VA accelerators supporting 4:2:0 video.

 All Y samples are first displayed in memory as an array of unsigned  char values ​​with an even number of rows. Immediately following the Y plane is an array of unsigned  char  values ​​containing packed U (Cb) and V (Cr) samples, as shown in Figure 13. When the combined UV array is viewed as an   array of little-endian WORD values, the LSB contains the U values ​​and the MSB contains the V values. NV12 is the preferred 4:2:0 pixel format for DirectX VA. It is expected to be a mid-term requirement for the DirectX VA Accelerator to support 4:2:0 video.

Figure 13:

NV21

  The memory layout of NV21 and NV12 is the same, but the order of interleaved storage of U and V components is reversed. In NV21 format, it is stored in the interleaved manner of VU, as shown in the figure.

3.3) Memory calculation of YUV420

width * height = Y(sum)

U = Y / 4 V = Y / 4

So the length of YUV420 data in memory is width * hight * 3 / 2 (that is, one YUV is 1.5 bytes), so calculate the size of the collected data: width * hight * 1.5*frame*time

Take the 720×488 size image YUV420 planar as an example,

Its storage format is: the total size is 720×480×3×1.5 bytes,

Divided into three parts: Y, U and V

Y component: (720×480) bytes

U(Cb) component: (720×480×1/4) bytes

V(Cr) component: (720×480×1/4) bytes

All three parts are stored in row priority, and the three parts are stored in Y, U, and V order.

That is, 0--720×480 bytes of YUV data is the Y component value,

720×480--720×480×5/4 bytes are U components

720×480×5/4 —— 720×480×3/2 byte is the V component.

Generally speaking, the directly collected video data is in the format of RGB24, the size of one RGB24 frame size=width×heigth×3 Bit, the size of RGB32=width×heigth×4, and the data volume of the YUV standard format 4:2:0 It is size=width×heigth×1.5 Bit.

After the RGB24 data is collected, the data in this format needs to be compressed for the first time. The color space of the image is determined by RGB2YUV. Because, X264 requires standard YUV (4:2:0) when encoding.

RGB24 -> YUV (I420) after the first data compression. In this way, the amount of data will be reduced by half, and after X264 encoding, the amount of data will be greatly reduced. Pack the encoded data and transmit it in real time through RTP. After arriving at the destination, the data is taken out and decoded. After decoding, the data is still in YUV format, so a conversion is required, which is YUV2RGB24.

3.4) About IOS

Anyone who has done iOS hard decoding knows that when creating a decoder, you need to specify the PixelFormatType. IOS only supports NV12, which is one of YUV420. You search for 420 and find four of them, as follows:

kCVPixelFormatType_420YpCbCr8Planar

kCVPixelFormatType_420YpCbCr8PlanarFullRange

kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange

kCVPixelFormatType_420YpCbCr8BiPlanarFullRange

According to the surface meaning, it can be seen that it can be divided into two categories: planar (plane 420p) and BiPlanar (double plane).

There is another way to distinguish, CVPixelBufferGetPlaneCount (pixel) to get the number of planes, and found that kCVPixelFormatType_420YpCbCr8Planar and kCVPixelFormatType_420YpCbCr8PlanarFullRange are three sides, belonging to 420p, iOS does not support. And kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and kCVPixelFormatType_420YpCbCr8BiPlanarFullRange are two planes. This is tangled, which one to use?

I checked the official website information, the explanation is as follows:

kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v', /* Bi-Planar Component Y'CbCr 8-bit 4:2:0, video-range (luma=[16,235] chroma=[16,240]).  baseAddr points to a big-endian CVPlanarPixelBufferInfo_YCbCrBiPlanar struct */

kCVPixelFormatType_420YpCbCr8BiPlanarFullRange  = '420f', /* Bi-Planar Component Y'CbCr 8-bit 4:2:0, full-range (luma=[0,255] chroma=[1,255]).  baseAddr points to a big-endian CVPlanarPixelBufferInfo_YCbCrBiPlanar struct */

I feel that apart from the difference in brightness and color range, I didn't find any other differences. Still tangled, and then checked Shenwang, some people said that there is a WWDC video, URL: https://developer.apple.com/videos/play/wwdc2011/419/?time=1527 (about 25:30')

The explanation is as follows:

But I still don't know, anyone who knows please let me know.

Then I used kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and kCVPixelFormatType_420YpCbCr8BiPlanarFullRange respectively when I created it. I didn’t find any difference when the video was played. The only difference is that the calculation step size is different.

For example: 480*640

If yes: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange

The step size of Y and UV is 512 (64-byte aligned non-aligned 0 complement is used) The line width of Y is 640, and the line width of UV is 320

If yes: kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange

The step size of Y and UV is 480 (actual length, unfilled) The line width of Y is 640, and the line width of UV is 320

I setPreset when I was collecting, so according to the above tips (but I still don't understand it), I finally chose kCVPixelFormatType_420YpCbCr8BiPlanarFullRange in my project.

4. Storage method

Below I will show the storage method of the common YUV code stream in the form of a picture, and the method of sampling the YUV data of each pixel is attached after the storage method, where the meanings of Cb and Cr are equivalent to U and V.

(1) YUVY format (belongs to YUV422)

image

YUYV is one of the storage formats of YUV422 sampling. Two adjacent Ys share their adjacent two Cb and Cr. Analysis shows that for pixel points Y'00 and Y'01, the values ​​of Cb and Cr They are all Cb00 and Cr00, and the YUV values ​​of other pixels are deduced by analogy.

(2) UYVY format (belongs to YUV422)

image

The UYVY format is also one of the storage formats of YUV422 sampling, but the difference from YUYV is that the order of UV is different. The method of restoring the YUV value of each pixel is the same as above.

(3) YUV422P (belongs to YUV422)

image

YUV422P is also a kind of YUV422. It is a Plane mode, that is, a plane mode. It does not store YUV data interleaved, but stores all Y components first, then stores all U (Cb) components, and finally stores all V(Cr) component, as shown in the figure above. The YUV value extraction method of each pixel is also the most basic extraction method following the YUV422 format, that is, two Ys share one UV. For example, for pixel points Y'00 and Y'01, the values ​​of Cb and Cr are both Cb00 and Cr00.

(4) YV12, YU12 format (belonging to YUV420)

image

YU12 and YV12 belong to the YUV420 format, which is also a Plane mode. The Y, U, and V components are packaged separately and stored in sequence. The YUV data extraction of each pixel follows the extraction method of YUV420 format, that is, 4 Y components share a set of UV. Note that in the figure above, Y'00, Y'01, Y'10, and Y'11 share Cr00 and Cb00, and so on.

(5) NV12, NV21 (belonging to YUV420)

image

NV12 and NV21 belong to the YUV420 format, which is a two-plane mode, that is, Y and UV are divided into two Planes, but UV (CbCr) is stored interleaved instead of divided into three planes. The extraction method is similar to the previous one, that is, Y'00, Y'01, Y'10, and Y'11 share Cr00 and Cb00


I420: YYYYYYYY UU VV =>YUV420P
YV12: YYYYYYYY VV UU =>YUV420P
NV12: YYYYYYYY UVUV =>YUV420SP
NV21: YYYYYYYY VUVU =>YUV420SP

YUV video viewing tool: Android YUV file viewing tool_Alex_designer's blog-CSDN blog_yuv viewer

Remarks: There are many sources of information above, and some of them have forgotten where they saw it. It is mainly recorded to understand the conversion of YUV video NV12 and NV21. After a general understanding of YUV, it is easy to understand by looking at the code!

reference:

Detailed explanation of YUV data format - yooooooo - 博客园

Image and streaming media--detailed YUV data format - Programmer Sought

5. Case

1. yuv generates jpg pictures

YUV video stream generation and conversion into jpg pictures_Alex_designer's blog-CSDN blog_yuv to picture

2. NV12 to NV21 Y are the same, but the UV arrangement is reversed.

 
 
  1. //获取的nv12视频流数据

  2. byte[] mdata = new byte[width * height * 3 / 2];

  3. //mdata --->NV12

  4. // YYYY

  5. // UVUV

  6. //To ----->NV21

  7. // YYYY

  8. // VUVU

  9. //swap UV

  10. // int j,i;

  11. int uv_len = 1280 * 720 /2;

  12. int uv_pos = 1280 * 720;

  13. for (int i = 0; i < uv_len;i+=2) {

  14. byte swap = mdata[uv_pos+i];

  15. mdata[uv_pos+i] = mdata[uv_pos+i+1];

  16. mdata[uv_pos+i+1] = swap;

  17. }

.......to be updated

Guess you like

Origin blog.csdn.net/weixin_35804181/article/details/124347496