Audio and Video Codec (1) - Decoding: Code Implementation

Foreword: For students who have never been exposed to audio and video codecs, the learning curve of using FFmpeg may be slightly steep. I just need to use it in the project due to my work needs. Therefore, the development process is specially summarized. It should only be provided to interested students for reference and study.

Since FFmpeg is developed in C, all and function calls are process-oriented. Based on my current learning experience, I usually put all the code of a function in the main function to implement it. After testing and modification, it is considered that the function is normal, and then the code is gradually decomposed and encapsulated in a C++ object-oriented way. Therefore, in this set of guides, I will also take the steps of implementing code first and then encapsulating functions.

1. Preparations before development

The development tool is VS2013+Qt5, the directory structure:

  • bin: working and test directories
  • doc: Development documentation directory
  • include: ffmpeg header file configuration directory
  • lib: ffmpeg static library configuration directory
  • src: source directory

Property page configuration:

  1. General - output directory: ..\..\bin
  2. debug - working directory: ..\..\bin
  3. C/C++ - General - Additional Include Directories: ..\..\include
  4. Linker - General - Additional Libraries Directory: ..\..\lib
  5. linker-system-subsystem:console (/SUBSYSTEM:CONSOLE)

2. Basic knowledge of encoding and decoding

(1) Package format

The so-called package format refers to the combined format of audio and video. For example, the most common package formats are mp4, mp3, flv, etc. Simply put, the audio and video files with suffixes that we usually come into contact with are all package formats. Different encapsulation formats follow different protocol standards. Interested students can expand on their own, and I don't know anything deeper.

(2) Encoding format

Take mp4 as an example, which should usually contain video and audio. The encoding format of video is YUV420P , and the encoding format of audio is PCM . Take the YUV420 encoding format as an example. We know that the image is usually displayed as RGB (red, green and blue primary colors). When compressing the video, the RGB representing each frame of the picture will first be compressed into YUV , and then according to the key frame ( I frame), transition frame ( P frame or B frame) frame) for computation and encoding. The decoding process is just the opposite, the decoder will read the I frame, and calculate and decode the P frame and B frame according to the I frame. And finally restore the RGB data of each frame according to the preset FPS of the video file. Finally push to the graphics card. Therefore, the encoding process we usually refer to includes: image capture, transcoding, encoding and repackaging.

(3) What is the difference between video decoding and audio decoding

Students who play games are definitely no stranger to FPS . If the FPS is too low, the picture will feel that the flickering is not coherent. The higher the FPS , the better the performance of the graphics card. The capture speed of some high-speed cameras can reach 11,000 frames per second, so do we also need to play at 11,000 frames per second when playing such movies? Of course not, usually we set the FPS value of the image according to 25 frames per second or 60 frames per second. However, due to the difference between the key frame and the transition frame in the video, the key frame saves the complete picture and the transition frame only saves the changed part of the previous frame, which needs to be obtained through the key frame calculation. Therefore, we need to decode each frame, that is, to obtain the YUV data of the picture. At the same time, only transcode the pictures we really need to display, that is, convert YUV data into RGB data, including calculating the width and height of the picture.

This is not the case with audio, which must be played back in sync with the capture. Raising or slowing down the playback speed of the audio will change the sound quality, which is how a voice changer works. Therefore, in order to ensure the synchronization of audio and video playback in actual development, we often control the decoding and transcoding speed of the video according to the playback speed of the audio.

3. Code Implementation

(1) Register FFmpeg components: register and initialize FFmpeg wrappers and network devices

av_register_all ();
avformat_network_init();
avdevice_register_all();

 (2) Open files and create input devices

AVFormatContext *pFormatCtx = NULL;
int errnum = avformat_open_input(&pFormatCtx, filename, NULL, NULL);
if (errnum < 0) {
    av_strerror(errnum, errbuf, sizeof(errbuf));
    cout << errbuf << endl;
}

AVFormatContext represents an encapsulator, which is responsible for saving context information related to encapsulation and codec when reading multimedia files. The avformat_open_input function can create wrappers based on file extensions.

(3) Traverse the stream and initialize the decoder

for (int i = 0; i < pFormatCtx->nb_streams; ++i) {
    AVCodecContext *pCodecCtx = pFormatCtx->streams[i]->codec; // decoder context 
    if (pCodecCtx->codec_type == AVMEDIA_TYPE_VIDEO) { // video channel 
        int videoIndex = i;
            
        // The width and height of the video 
        int srcWidth = pCodecCtx-> width;
         int srcHeight = pCodecCtx-> height;
            
        // Create a video decoder, open the decoder 
        AVCodec *codec = avcodec_find_decoder(pCodecCtx-> codec_id);
         if (! codec) {
             // Unable to create the corresponding decoder 
        }

        errnum = avcodec_open2(pCodecCtx, codec, NULL);
        if (errnum < 0) {
            av_strerror(errnum, errbuf, sizeof(errbuf));
            cout << errbuf << endl;
        }
        cout << "video decoder open success!" << endl;
    }
    if (pCodecCtx->codec_type == AVMEDIA_TYPE_AUDIO) { // audio channel 
        int audioIndex = i;
         // Create an audio decoder, open the decoder 
        AVCodec *codec = avcodec_find_decoder(pCodecCtx-> codec_id);
         if (! codec) {
             // Could not create corresponding decoder 
        }

        errnum = avcodec_open2(pCodecCtx, codec, NULL);
        if (errnum < 0) {
            av_strerror(errnum, errbuf, sizeof(errbuf));
            cout << errbuf << endl;
        }

        int sampleRate = pCodecCtx->sample_rate; // audio sample rate 
        int channels = pCodecCtx->channels; // channel number 
        AVSampleFormat fmt = pCodecCtx->sample_fmt; // sample format 

        cout << " audio decoder open success! " << endl;
    }
}    

The encapsulator saves various streaming media channels, usually the video channel is 0 and the audio channel is 1. In addition, it may also include subtitle stream channels, etc.

Steps 2 and 3 are basically the main steps to open multimedia files, and all parameters for decoding and transcoding can be obtained here. Next, we need to read, decode, and transcode in a loop until the playback is complete.

(4) Read compressed data: The reason why it is called compressed data is mainly to distinguish the two structures of AVPacket and AVFrame . AVPacket represents a picture after key frame or transition frame encoding, AVFrame represents a complete YUV picture decoded by AVPacket

AVPacket * pkt = NULL;
pkt = av_packet_alloc(); // Initialize AVPacket
 // Read a frame of data 
errnum = av_read_frame(pFormatCtx, pkt);
 if (errnum == AVERROR_EOF) {
     // The end of the file has been read 
    av_strerror(errnum, errbuf, sizeof ( errbuf));
    cout << errbuf << endl;
}
if (errnum < 0) {
    av_strerror(errnum, errbuf, sizeof(errbuf));
    cout << errbuf << endl;
}

(5) Decoding

errnum = avcodec_send_packet(pCodecCtx, pkt);
if (errnum < 0) {
    av_strerror(errnum, errbuf, sizeof(errbuf));
    cout << errbuf << endl;
}

AVFrame * yuv = av_frame_alloc ();
AVFrame *pcm = av_frame_alloc();
 if (pkt->stream_index == videoIndex) { // Determine the current decoded frame as a video frame 
    errnum = avcodec_receive_frame(pCodecCtx, yuv); // Decode video 
    if (errnum < 0 ) {
        av_strerror(errnum, errbuf, sizeof(errbuf));
        cout << errbuf << endl;
    }
}
if (pkt->stream_index == audioIndex) { // Determine that the current decoded frame is an audio frame 
    errnum = avcodec_receive_frame(pCodecCtx, pcm); // Decode audio 
    if (errnum < 0 ) {
        av_strerror(errnum, errbuf, sizeof(errbuf));
        cout << errbuf << endl;
    }
}

(6) Video transcoding

// 720p output standard 
int outWidth = 720 ;
 int outHeight = 480 ;
 char *outData = new  char [outWidth * outHeight * 4 ]

SwsContext *videoSwsCtx = NULL;
videoSwsCtx = sws_getCachedContext(videoSwsCtx, srcWidth, srcHeight, (AVPixelFormat)pixFmt, // input 
    outWidth, outHeight, AV_PIX_FMT_BGRA, // output 
    SWS_BICUBIC, // algorithm 
    0 , 0 , 0 );

// Allocate data space 
uint8_t *dstData[AV_NUM_DATA_POINTERS] = { 0 };
dstData [ 0 ] = (uint8_t * ) outData;
int dstStride [AV_NUM_DATA_POINTERS] = { 0 };
dstStride[0] = outWidth * 4;

int h = sws_scale(videoSwsCtx, yuv->data, yuv->linesize, 0 , srcHeight, dstData, dstStride);
 if (h != outHeight) {
     // Transcoding failed 
}

Here we need to explain the reason for the calculation of outWidth * outHeight * 4: the 720p standard video screen contains 720 * 480 pixels, each pixel contains RGBA4 type data, and each type of data is represented by 1 byte or 8 bits. Therefore, the size of a complete picture is outWidth * outHeight * 4.

(7) Audio transcoding

char *outData = new  char [ 10000 ]; output pointer

AVCodecContext *pCodecCtx = pFormatCtx->streams[audioIndex]->codec; // Get the audio decoder context 
SwrContext *audioSwrCtx = NULL;
audioSwrCtx = swr_alloc ();
audioSwrCtx = swr_alloc_set_opts(audioSwrCtx,
    AV_CH_LAYOUT_STEREO, AV_SAMPLE_FMT_S16, 44100 , // Output parameters: dual-channel stereo CD sound quality 
    pCodecCtx->channel_layout, pCodecCtx->sample_fmt, pCodecCtx->sample_rate, // input parameters 
    0 , 0 );
swr_init(audioSwrCtx);

uint8_t *out[AV_NUM_DATA_POINTERS] = { 0 };
out[0] = (uint8_t *)outData;
// 计算输出空间
int dst_nb_samples = av_rescale_rnd(pcm->nb_samples, pCodecCtx->sample_rate, pCodecCtx->sample_rate, AV_ROUND_UP);
int len = swr_convert(audioSwrCtx,
    out, dst_nb_samples,
    (const uint8_t **)pcm->data, pcm->nb_samples);
    
int channels = av_get_channel_layout_nb_channels(AV_CH_LAYOUT_STEREO); // AV_CH_LAYOUT_STEREO -> 2 get the number of channels according to the channel type
 // actual audio data length 
int dst_bufsize = av_samples_get_buffer_size(NULL,
    channels, // Number of channels 
    pcm->nb_samples, // 1024 
    AV_SAMPLE_FMT_S16,
     0 );
 if (dst_bufsize < 0 ) {
     // Audio transcoding error 
}

So far, we have basically completed the decoding of a multimedia file, but there is still some work to be done before the actual playback. Including the encapsulation of code and interface design, we will introduce it in the next blog.

Complete project code: https://gitee.com/learnhow/ffmpeg_studio/tree/master/_64bit/src/av_player

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325109661&siteId=291194637