H264 video coding theory

First, why should the video coding

Video image is composed of a frame, just as gif image. General video in order not to make people feel Caton, a second at least 16 frames (usually 30). The video is added to a resolution of 1280x720, the size of the encoded without a second transmission is 1280x720x60 ~ = 843M. It is not encoded video simply can not save and transport. Now the market is mainly divided into two categories encoded H.264 and MPEG. The latter is mainly used for DVD, set-top boxes and other equipment. h264 encoding is a mainstream encoding format. In addition H265 also belong to one of them, such as our cinemas the film, some of the high-definition television coding is used in this technique.

Two, H264 encoding rules

  • Adjacent pieces of picture, the pixel difference is generally within 10% of the points, the luminance difference value does not change more than 20%, the chromaticity difference varies less than 1%, so little change for a period of picture, we can first encoded into a complete picture frame a
  • Then the B-frame does not encode the entire image frame is written only difference A, 10% B frame so that only the size of a complete frame size or less ! C B frame after frame if little change, we can continue to reference the B frame mode for encoding, followed by the cycle continues.
  • We call this image a sequence: the sequence is to have the same characteristics of a piece of data. When the big image change image with the previous one, can not be generated from the previous frame, we ended on a stretch of open under a sequence, that is, to generate a complete image of the frame A1, A1 subsequent frames reference frames write content only difference with A1. This sequence is Gop sequence, we can treat it as a scene, such as a scene A and scene B, A scene background is red, the green background of the scene B, then A and B are two sequences. Each sequence is started from the I-frame, and unique, followed by B and P frames. Is a sequence of frames between two I

Three, H264 coding

I-frame, how P frames, B frames is generated it? As we mentioned above, when a lot of difference when the two scenes, will resume a sequence, then this sequence is started from an I-frame, also known as the key frame. Then I frames very high similarity, reaches more than 95%, were encoded as a B frame; similarity of 70% as P frames. How to encode I, P, B frames do not need our own realization, x264 tool has helped us complete.

As already mentioned the purpose of encoding it is to facilitate transfer (refer to file transfer, streaming network, etc.). But we are not able to pass over a frame, the contents of one of dozens of k is too big , we also need to better transmission segment, so we need a smaller transmission unit, to ensure a higher compression, fault tolerance as well as real-time viewing properties. Then introduced NALU unit

Four, NALU unit

The figure can be seen that one data (a picture) is composed of a number of units NALU, NALU a cell divided into two parts: NAL header and RBSP

. 1, NAL header: RBSP data type identifier NAL unit, wherein, the nal_unit_type is 1, 2, 3, NAL unit 4, 5 is referred to as VCL NAL units, NAL units of other types of non-VCL NAL units

  • 0: Not specified
  • 1: non-IDR picture are not used in the divided data segments
  • 2: Class A non-IDR picture data into segments
  • 3: Class B non-IDR picture data into segments
  • 4: class C non-IDR picture data into segments
  • 5: IDR picture segment
  • 6: Supplemental Enhancement Information (SEI)
  • 7: a sequence parameter set (SPS)
  • 8: Picture Parameter Set (PPS)
  • 9: delimiter
  • 10: terminator sequence
  • 11: a flow terminator
  • 12: fill data
  • 13: a sequence parameter set extension
  • 14: NAL unit prefixed
  • 15: sequence parameter set
  • 16 - 18: Reserved
  • 19: image segment encoded without using the auxiliary data partitioning
  • 20: a fragment encoding the extended
  • 21 - 23: Reserved
  • 24--31: Not specified

2, RBSP: again become slices, each slice is composed of a slice header and slice data, the slice data is composed of several macro blocks. Comprising a sequence parameter set SPS and the picture parameter set PPS 

3, SPS and PPS

Concept: contains information initialization parameters required for encoding H.264. Used includes an encoder profile, level, the image width and height, deblock filters

SPS: Sequence Parameter Set, such as an identifier of seq_parameter_set_id, the POC frames and constraints, the number of reference frames, the decoded image size and a frame identifier field coding mode selection and so on.

PPS: picture parameter set which parameters such as identifiers pic_parameter_set_id, of seq_parameter_set_id Alternatively, entropy coding mode selection identifier, the number of slice groups, and the initial quantization parameter identification deblocking filter coefficient adjustment and the like. 

After all to "0x00 0x00 0x01" or "0x00 0x00 0x00 0x01" as the start code, the start code found in H.264 coding, the first byte of the start code after the use of a low 5 to 7 determines whether ( SPS) or 8 (pps), i.e. data [4] & 0x1f == 7 or data [4] & 0x1f == 8. Then, after removing the start code acquired nal base64-encoded, the information obtained can be used for sdp, sps, pps, separated by commas required.

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/qinbin2015/article/details/90727938