Basic knowledge of RGB and YUV pixels and processing data

Basic knowledge of RGB and YUV pixels and processing data


table of Contents

  1. RGB
  2. YUV
  3. RGB, YUV pixel data processing

Reprinted from Gemfield’s Zhihu article: From YUV to RGB
Reprinted from Thor’s blog: Introduction to audiovisual data processing: RGB, YUV pixel data processing


1. RGB

1. Basic overview of RGB

  1. The RGB color mode is a color standard in the industry. It obtains various colors by changing the three color channels of red®, green (G), and blue (B) and superimposing them with each other. RGB is the color representing the three channels of red, green, and blue. This standard includes almost all colors that human vision can perceive, and is one of the most widely used color systems.
    Insert picture description here

  2. Under normal circumstances, RGB each has 256 levels of brightness, which are represented by numbers from 0, 1, 2... until 255, so each color can be completely represented inside the computer with 1 byte = 8bit. According to calculations, 256-level RGB colors can combine a total of about 16.78 million colors, that is, 256×256×256=16777216. It is often referred to as 16 million colors or tens of millions of colors, also known as 24-bit color (2 to the 24th power).

  3. For example, a 1080p picture represents 1920 * 1080 pixels. If RGB encoding is used, each pixel has three primary colors of red, green, and blue. Each primary color occupies 1 byte and each pixel occupies 3 bytes, so a 1080p picture occupies 1920 * 1280 * 3/1024/1024 = 7.03125MB storage space. For example, the famous BMP bitmap saves pictures in this way (the so-called RGB888 format, or 24-bit bitmap format).

  4. The image information volume of 7.03125MB does not mean that the size of the corresponding image file is 7.03125MB, because of two points:

    1. The image information will be compressed to reduce the file size;
    2. The image file will be injected with meta data, and the file size will be slightly increased;
  5. In general, the picture files must have become smaller. There are two main types of compression:

    1. Lossless compression, such as BMP bitmap files-at most, run-length encoding (RLE, run-length encoding) is used for light lossless data compression (the segmentation data of the COCO data set also uses RLE), so the size is 7.03125MB The content of will be saved as a file of about 6MB;
    2. Lossy compression, the most popular is jpg. Various complex compression algorithms will reduce the size of the image file to about tens of KB to hundreds of KB depending on the content of the image.

2. RGB format

In simple terms, the representation of RGB in a computer is mainly divided into two categories, one is the index form and the other is the pixel form.

  1. index:
    1. Such as RGB1, RGB4, RGB8, which means that each pixel is represented by 1 bit, 4 bits, and 8 bits respectively. Then, these bits store not the actual R, G, B values, but the corresponding pixels Index in the palette.
  2. Pixel format:
    1. Such as RGB565, RGB555, RGB24, RGB32, ARGB32, these formats store the R, G, B value of each pixel. For example, RGB24 uses 8 bits to represent R, G, and B respectively.

3. CHW and BGR concepts in RGB images

  1. Taking the famous image algorithm library OpenCV as an example, we often hear a concept: Is the image format of an API output or output is CHW or HWC, RGB or BGR... But what does this mean?
  2. Use an example to explain, in Python, use OpenCV to read a picture gemfield.jpg (640x640):
    Insert picture description here
  3. (640, 640, 3) This result is the hwc format (HWC three letters stand for Height, Width, Channel respectively), which is equivalent to in the memory, img is a three-dimensional array, from the outer layer to the inner layer is height, width, channel; that is to say: img is composed of 640 heights, each height is composed of 640 widths, and each width is composed of 3 channels. On each width, if it is in BGR format, a width is [B,G,R], if it is RGB, then a width is [R,G,B]. The output of OpenCV imread is in BGR format, so the img object looks like this three-dimensional array:
    Insert picture description here
  4. If you want to convert hwc format to chw format, use numpy's transpose:
    Insert picture description here
  5. Then it is in the chw format. The img is still in the BGR format and is still a 3-dimensional array: composed of 3 channels, each channel is composed of 640 heights, and each height is composed of 640 widths. img looks like this in memory:
    Insert picture description here

2. YUV

1. Basic overview of YUV

  1. What you already know is that YUV encoding uses brightness and chroma to represent the color of each pixel. Where Y represents the brightness (Luminance, Luma), which is the grayscale value. U and V represent chrominance (Chrominance or Chroma). Y'UV was invented when engineers wanted to use color TV in a black and white infrastructure. They need a signal transmission method that is compatible with black and white (B&W) televisions while at the same time adding color. The brightness component already exists as a black and white signal, so they added a UV signal as a solution.

  2. The Y'UV model defines a color space based on one luminance component (Y') and two chrominance components (called U (blue projection) and V (red projection), respectively). The Y'UV color model is used in the PAL composite color video (not including PAL-N) standard. Previous black and white systems only used brightness (Y') information. The color information (U and V) is added separately through the subcarrier, so the black and white receiver can still receive and display color pictures in the receiver's native black and white format.

  3. Y'stands for luminance component (luminance), U and V stand for chrominance (color) components; the ranges of terms Y'UV, YUV, YCbCr, YPbPr, etc. are sometimes ambiguous and overlapping. Historically, the terms YUV and Y'UV were used for specific analog coding of color information in television systems, while YCbCr was used for digital coding of color information, suitable for video and still image compression and transmission, such as MPEG and JPEG. Today, the term YUV is commonly used in the computer industry to describe file formats that use YCbCr encoding.

  4. YUV is not a format, but a variety of subdivision formats. From the storage format dimension and sampling format dimension to introduce separately.

2. YUV storage format

  1. First of all, according to the storage method, YUV can be divided into packeted formats and planar formats; the former is the YUV components are arranged crosswise, similar to the hwc in the RGB field; the latter is the YUV components are divided into three arrays and stored without mixing together. Similar to chw in the RGB field.
    Insert picture description here

3. YUV sampling format

  1. Secondly, YUV can be divided into various formats according to different sampling rates and sampling methods.
  2. To say what advantage YUV has over RGB is to save space. why? Because of the YUV component, the UV component is not sensitive to the human eye, so the sampling rate can be reduced without affecting human vision. For the following YUV420 format:
    Insert picture description here
  3. It can be seen that the UV component is only a quarter of the Y component, so that every 4 pixels is 6 bytes (while RGB is 12 bytes), which saves half of the space compared to RGB. In addition to YUV420, there are mainstream formats such as YUV444 (the same size as RGB) and YUV422.

4. Mainstream YUV420 format

  1. Among these mainstream YUV formats, YUV420 is more common (because it saves more space), and it is derived from the sampling format of YUV420, superimposed on different storage formats, it can be subdivided into many. You can refer to: http://www.fourcc.org/yuv.php .
  2. When the YUV420 sampling format is superimposed on the planar storage format, YUV420P and YUV420SP formats can be generated. YUV420P is stored after Y after U and then V, which is equivalent to chw in RGB, and can be subdivided into I420 (also called YU12) and YV12 formats; and YUV420SP is the alternative of UV after Y..., and can be subdivided into NV12 and NV21 format. The four subdivision formats are as follows:
    1. NV12, FourCC is 0x3231564E, 1 pixel 12 bit, 8-bit Y plane, after the Y channel plane is crossed U/V channel plane, U/V uses 2x2 sampling rate (each is a quarter of Y ).
    2. NV21, FourCC is 0x3132564E, 1 pixel 12 bit, same as NV12-except that when U/V is crossed, it is V and then U, which is the difference between U/V and V/U; this is the standard for camera images on Android ;
    3. I420 (also called YU12), FourCC is 0x30323449, 1 pixel 12 bit, 8 bit Y channel plane is finished, U channel plane, and finally V channel plane;
    4. YV12, FourCC is 0x32315659, 1 pixel 12 bit, 8 bit Y channel plane is finished, it is V channel plane, and finally U channel plane;

5. YUV to RGB conversion

  1. One side is the producer of the YUV format, and the other side is the consumer of the RGB format. How to convert from YUV to RGB? As mentioned above, there are many kinds of subdivision formats of YUV (dozens), there is not so much energy, and there is no need to spend that energy to introduce them one by one. Therefore, only the mainstream mainstream YUV formats to RGB are given here. Conversion.
1. Conversion from Y'UV420p (and Y'V12 or YV12) to RGB888
  1. Y'UV420p is a flat format, which means that the Y', U and V values ​​are grouped together rather than scattered together. The reason for this is that by grouping the U and V values ​​together, the image becomes more compressible. When an image array in Y'UV420p format is given, all Y'values ​​appear first, then all U values, and finally all V values.

  2. The Y'V12 format is essentially the same as Y'UV420p, but it switches the U and V data: the Y'value is followed by the V value, and finally the U value. As long as you pay attention to extracting U and V values ​​from appropriate locations, you can use the same algorithm to process Y'UV420p and Y'V12.

  3. As with most Y'UV formats, there are as many Y'values ​​as pixels. Where X is the number of pixels, the first X elements in the array correspond to the Y'value of each individual pixel. However, the U and V values ​​are only a quarter of Y, which means that each U and V element applies to four pixels.
    Insert picture description here

  4. As shown in the figure above, the Y', U and V components in Y'UV420 are respectively coded in sequential blocks. A Y'value is stored for each pixel, and a U value and a V value are stored for each 2×2 square pixel block. The above figure shows the corresponding Y', U and V values ​​using the same color. Read the byte stream line by line from the device, the Y'block is located at position 0, the U block is located at x×y (in this example, 6×4 = 24), and the V block is located at the position x × y + (x×y)/ 4 (here 6×4 + (6×4)/4 = 30).


3. RGB, YUV pixel data processing

Introduction to video and audio data processing: RGB, YUV pixel data processing


Guess you like

Origin blog.csdn.net/weixin_41910694/article/details/107639856