IOS audio and video codec-notes

Audio and video codec

1. Soft coding and hard coding

1) Distinguish between soft coding and hard coding

  • Soft encoding: uses the CPU for encoding.
  • Hard coding: Use non-CPU for coding, such as graphics card GPU, dedicated DSP, FPGA, ASIC chip, etc.

2) Comparison between soft coding and hard coding

  • Soft coding: Direct and simple to implement, easy to adjust parameters, and easy to upgrade, but the CPU load is heavy and the performance is lower than hard coding. The quality at low bit rates is usually better than hard coding.
  • Hard coding: high performance, and the quality at low bit rates is usually lower than that of soft coding. However, some products have transplanted excellent soft coding algorithms (such as x264) on the GPU hardware platform, and the quality is basically the same as soft coding.

2. Overview

Use AVCaptureSession to collect the CMSampleBuffer (PCM, YUV) of audio and video in real time. Real-time audio collection can also be achieved through AudioQueue or AudioUnit.

Audio codec: Use AudioToolBox or ffmpeg to implement AAC soft codec.

Video codec: Use VideoToolBox to implement H264 hard codec, or ffmpeg + x264 to implement H264 soft codec.

1) Audio

  • Code rate: Bit rate refers to the number of bits transmitted or decoded per unit time.
  • The higher the bit rate, the higher the audio and video quality, but it also consumes performance (transmission, storage, playback).
  • Audio bit rate = sampling rate * channel * bit depth
  • AudioToolBox encoding: PCM => AAC ( CVBlockBuffer => CVBlockBuffer )
  • Before and after AAC hardcoding: CMBlockBuffer + CMTime + AudioStreamDescption => CMSampleBuffer
  • ADTS data format: ADTS Header (7Bytes) + ACC ES (AAC original data) < Writing AAC original data to a file requires adding an ADTS header, otherwise the .aac file cannot be played >

Note: AudioConverter of AudioToolBox directly uses the original data before and after encoding and decoding as data filling and output.

2) Video

The bit rate and resolution in the video are positively matched, and different resolutions have their corresponding bit rate ranges.

  • Smooth: Bit rate 300~500kbps--Resolution 480*360 (360P)
  • Standard definition: bit rate 600~900kbps--resolution 640*480 (480P)
  • HD: bit rate 1000~1900kbps--resolution 1280*720 (720P)
  • Ultra-clear: bit rate 2000~4000kbps--resolution 1920*1080 (1080P)

  • VideoToolBox hardcoded YUV => H264 ( CVImageBuffer | CVPixelBuffer => CVBlockBuffer
  • VideoToolBox encoding inputs CVPixelBuffer and outputs CMSampleBuffer; decoding inputs CMSampleBuffer and outputs CVPixelBuffer.

  • CMSampleBuffer structure before encoding: CVPixedBuffer + CMTime + CMVideoFormatDescription
  • Encoded CMSampleBuffer structure: CMBlockBuffer + CMTime + CMVideoFormatDescription

  • H264: SPS - PPS - I Frame - P Frame - B Frame - B Frame - P Frame - P Frame ( Annex B format uses 0x000001 | 0x00000001 as the delimiter, AVCC format uses a 4-byte prefix as the delimiter, VideoToolBox hard Codec only supports AVCC format )

  • After H264 encoding,  CMVideoFormatDescription is converted into SPS + PPS; before H264 decoding, SPS + PPS is converted into CMVideoFormatDescription.

  • I frame - key frame: decompressed into a single complete picture through the video decompression algorithm.
  • P frame-difference frame: refer to the previous I frame or P frame to generate a complete picture.
  • B frame - before and after difference frame: refer to the nearest I frame or P frame before it, and the nearest P frame after it to generate a complete picture.

  • nal_unit_type ( the last 5 bits of NALU header ) : 5 IDR (image fragment, IDR is a type of I frame), 7 SPS (sequence parameter set), 8 PPS (image parameter set)
  • NALU header(1Byte) = forbidden_zero_bit(1bit) + nal_ref_idc(2bit) + nal_unit_type(5bit)

Audio and video transmission (pull streaming + push streaming)

RTMP protocol realizes audio and video transmission AAC and H264, and HLS on-demand realizes AAC and H264 code streams.

Related technology blogs

Live broadcast of iOS development (LFLiveKit push + ijkplayer + nginx/rmtp/ffmpeg server)

H264 code stream analysis

AAC ADTS parsing and generation

AVCapture captures audio and video

Collect PCM+AAC encoding (AudioQueue/AudioUnit)

Using AudioToolbox to implement AAC encoding and decoding

Encoding AAC using AudioToolbox

Hardcoding H.264 using VideoToolbox

Hard decoding H.264 using VideoToolbox

Guess you like

Origin blog.csdn.net/z119901214/article/details/80578833