Introduction to video decoding related knowledge in FFmpeg

1 Introduction

FFmpeg is a very popular open source cross-platform multimedia solution. It can be used to encode, decode, convert and stream various audio and video formats. This article will introduce the knowledge related to video decoding in FFmpeg.

2. Video decoder

FFmpeg's video decoder supports a variety of video encoding formats, including H.264, MPEG-4, AVC, VP9, ​​and more.

2.1 FFmpeg video decoder

The FFmpeg video decoder can be found using the avcodec_find_decoder() function, for example:

AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
if (!codec) {
    fprintf(stderr, "无法找到解码器\n");
    return -1;
}

The above code uses the AV_CODEC_ID_H264 parameter to find the H.264 decoder. Returns NULL if the specified decoder cannot be found.

2.2 Decoding video packets

In the video decoding process, the video data stream needs to be read, broken down into multiple compressed frames, and then each compressed frame is decoded into an original video frame. The video data stream can be read using the av_read_frame() function, for example:

AVPacket packet;
​
while (av_read_frame(pFormatCtx, &packet) >= 0) {
​
    // 如果是视频流
    if (packet.stream_index == videoStreamIndex) {
​
        // 发送数据包给解码器
        if (avcodec_send_packet(pCodecCtx, &packet) != 0) {
            fprintf(stderr, "无法向解码器发送数据包\n");
            break;
        }
​
        // 解码帧数据
        while (avcodec_receive_frame(pCodecCtx, pFrame) == 0) {
            // 处理解码后的视频帧
            // ...
        }
    }
​
    av_packet_unref(&packet);
}

The above code uses the av_read_frame() function to read each data packet in the video data stream, and then judges whether it is a video stream. If it is a video stream, use avcodec_send_packet() to send the data packet to the decoder for decoding, and then use the avcodec_receive_frame() function to receive the decoded video frame.

2.3 Video frame format conversion

During decoding, there is an option to decode video frames into different formats. Video frames can be converted to the desired output format using the SwsContext structure and the sws_scale() function. For example:

struct SwsContext *pSwsContext = sws_getContext(
    pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt,
    output_width, output_height, AV_PIX_FMT_RGB24,
    SWS_BILINEAR, NULL, NULL, NULL);
​
sws_scale(pSwsContext, pFrame->data, pFrame->linesize, 0, pCodecCtx->height, &buffer, NULL);

The above code creates a conversion context and uses the sws_scale() function to convert the source video frame to RGB format and store it in the buffer.

2.4 Clean up resources

After the video decoding is completed, the used resources need to be cleaned up to avoid memory leaks. You can use av_free(), av_frame_free(), avcodec_close() and other functions to release the corresponding resources. For example:

av_free(buffer);
av_frame_free(&pFrame);
avcodec_close(pCodecCtx);
avformat_close_input(&pFormatCtx);

The above are some basic FFmpeg video decoder related knowledge and code examples. The specific implementation may vary due to different application scenarios.

3. Decode audio and video

FFmpeg can decode audio and video at the same time. Decoding audio and video can use different functions, such as avcodec_decode_audio4() and avcodec_decode_video2() functions.

Decoding audio and video is one of the basic functions of FFmpeg

3.1 Decoding audio

To decode audio, you need to open the audio file, get the audio stream and decode it.

Here is a code sample for decoding audio:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
​
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/audio_fifo.h>
#include <libavutil/frame.h>
​
int main(int argc, char** argv) {
    // 打开输入音频文件
    AVFormatContext* format_ctx = NULL;
    if (avformat_open_input(&format_ctx, "input.mp3", NULL, NULL) != 0) {
        fprintf(stderr, "Failed to open input file\n");
        return -1;
    }
​
    // 获取音频流信息
    if (avformat_find_stream_info(format_ctx, NULL) < 0) {
        fprintf(stderr, "Failed to find stream information\n");
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 查找第一个音频流
    int audio_stream_idx = -1;
    for (int i = 0; i < format_ctx->nb_streams; i++) {
        if (format_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            audio_stream_idx = i;
            break;
        }
    }
    if (audio_stream_idx == -1) {
        fprintf(stderr, "No audio stream found\n");
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 获取音频解码器
    AVCodec* codec = avcodec_find_decoder(format_ctx->streams[audio_stream_idx]->codecpar->codec_id);
    if (codec == NULL) {
        fprintf(stderr, "Unsupported codec\n");
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 打开音频解码器
    AVCodecContext* codec_ctx = avcodec_alloc_context3(codec);
    if (avcodec_parameters_to_context(codec_ctx, format_ctx->streams[audio_stream_idx]->codecpar) < 0) {
        fprintf(stderr, "Failed to copy codec parameters to decoder context\n");
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
    if (avcodec_open2(codec_ctx, codec, NULL) < 0) {
        fprintf(stderr, "Failed to open codec\n");
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 初始化音频缓冲区和FIFO
    int frame_size = codec_ctx->frame_size;
    AVAudioFifo* fifo = av_audio_fifo_alloc(codec_ctx->sample_fmt, codec_ctx->channels, frame_size);
    if (fifo == NULL) {
        fprintf(stderr, "Failed to allocate audio FIFO\n");
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
    uint8_t** buffer = NULL;
    if (av_samples_alloc_array_and_samples(&buffer, NULL, codec_ctx->channels, frame_size, codec_ctx->sample_fmt, 0) < 0) {
        fprintf(stderr, "Failed to allocate audio buffer\n");
        av_audio_fifo_free(fifo);
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 解码音频流
    AVPacket packet;
    av_init_packet(&packet);
    while (av_read_frame(format_ctx, &packet) == 0) {
        if (packet.stream_index == audio_stream_idx) {
            int ret = avcodec_send_packet(codec_ctx, &packet);
            if (ret < 0) {
                fprintf(stderr, "Error sending a packet for decoding\n");
                break;
            }
            while (ret >= 0) {
                AVFrame* frame = av_frame_alloc();
                ret = avcodec_receive_frame(codec_ctx, frame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
                    av_frame_free(&frame);
                    break;
                } else if (ret < 0) {
                    fprintf(stderr, "Error during decoding\n");
                    av_frame_free(&frame);
                    break;
                }
                int samples = av_samples_get_buffer_size(NULL, codec_ctx->channels, frame->nb_samples, codec_ctx->sample_fmt, 0);
                if (samples <= 0) {
                    av_frame_free(&frame);
                    continue;
                }
                memcpy(buffer[0], frame->data[0], samples);
                if (codec_ctx->channels > 1) {
                    memcpy(buffer[1], frame->data[1], samples / 2);
                }
                av_audio_fifo_write(fifo, (void**)buffer, frame->nb_samples);
                av_frame_free(&frame);
            }
        }
        av_packet_unref(&packet);
    }
    av_packet_unref(&packet);
​
    // 关闭音频解码器和输入文件,释放缓冲区和FIFO
    avcodec_free_context(&codec_ctx);
    avformat_close_input(&format_ctx);
    av_freep(&buffer[0]);
    av_freep(&buffer);
    av_audio_fifo_free(fifo);
​
    return 0;
}

3.2 Decoding video

To decode a video, you need to open the video file, get the video stream and decode it. Here is a code sample to decode a video:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
​
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
​
int main(int argc, char** argv) {
    // 打开输入视频文件
    AVFormatContext* format_ctx = NULL;
    if (avformat_open_input(&format_ctx, "input.mp4", NULL, NULL) != 0) {
        fprintf(stderr, "Failed to open input file\n");
        return -1;
    }
​
    // 获取视频流信息
    if (avformat_find_stream_info(format_ctx, NULL) < 0) {
        fprintf(stderr, "Failed to find stream information\n");
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 查找第一个视频流
    int video_stream_idx = -1;
    for (int i = 0; i < format_ctx->nb_streams; i++) {
        if (format_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            video_stream_idx = i;
            break;
        }
    }
    if (video_stream_idx == -1) {
        fprintf(stderr, "No video stream found\n");
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 获取视频解码器
    AVCodec* codec = avcodec_find_decoder(format_ctx->streams[video_stream_idx]->codecpar->codec_id);
    if (codec == NULL) {
        fprintf(stderr, "Unsupported codec\n");
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 打开视频解码器
    AVCodecContext* codec_ctx = avcodec_alloc_context3(codec);
    if (avcodec_parameters_to_context(codec_ctx, format_ctx->streams[video_stream_idx]->codecpar) < 0) {
        fprintf(stderr, "Failed to copy codec parameters to decoder context\n");
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
    if (avcodec_open2(codec_ctx, codec, NULL) < 0) {
        fprintf(stderr, "Failed to open codec\n");
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 初始化视频缓冲区和转换器
    AVFrame* frame = av_frame_alloc();
    AVFrame* rgb_frame = av_frame_alloc();
    uint8_t* buffer = NULL;
    int num_bytes = av_image_get_buffer_size(AV_PIX_FMT_RGB24, codec_ctx->width, codec_ctx->height, 1);
    if (num_bytes <= 0) {
        fprintf(stderr, "Failed to get image buffer size\n");
        av_frame_free(&rgb_frame);
        av_frame_free(&frame);
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
    buffer = (uint8_t*)av_malloc(num_bytes * sizeof(uint8_t));
    if (buffer == NULL) {
        fprintf(stderr, "Failed to allocate image buffer\n");
        av_frame_free(&rgb_frame);
        av_frame_free(&frame);
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
    av_image_fill_arrays(rgb_frame->data, rgb_frame->linesize, buffer, AV_PIX_FMT_RGB24, codec_ctx->width, codec_ctx->height, 1);
    struct SwsContext* sws_ctx = sws_getContext(codec_ctx->width, codec_ctx->height, codec_ctx->pix_fmt, codec_ctx->width, codec_ctx->height, AV_PIX_FMT_RGB24, SWS_BILINEAR, NULL, NULL, NULL);
    if (sws_ctx == NULL) {
        fprintf(stderr, "Failed to initialize color conversion context\n");
        av_freep(&buffer);
        av_frame_free(&rgb_frame);
        av_frame_free(&frame);
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&format_ctx);
        return -1;
    }
​
    // 解码视频流
    AVPacket packet;
    av_init_packet(&packet);
    while (av_read_frame(format_ctx, &packet) == 0) {
        if (packet.stream_index == video_stream_idx) {
            int ret = avcodec_send_packet(codec_ctx, &packet);
            if (ret < 0) {
                fprintf(stderr, "Error sending a packet for decoding\n");
                break;
            }
            while (ret >= 0) {
                ret = avcodec_receive_frame(codec_ctx, frame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
                    break;
                } else if (ret < 0) {
                    fprintf(stderr, "Error during decoding\n");
                    break;
                }
                sws_scale(sws_ctx, frame->data, frame->linesize, 0, codec_ctx->height, rgb_frame->data, rgb_frame->linesize);
                // 在此处可以将RGB图像显示或保存到文件中
            }
        }
        av_packet_unref(&packet);
    }
    av_packet_unref(&packet);
​
    // 关闭视频解码器和输入文件,释放缓冲区和转换器
    sws_freeContext(sws_ctx);
    av_freep(&buffer);
    av_frame_free(&rgb_frame);
    av_frame_free(&frame);
    avcodec_free_context(&codec_ctx);
    avformat_close_input(&format_ctx);
​
    return 0;
}

The above code sample shows how to use FFmpeg to decode audio and video. In practical applications, it is also necessary to process, save or play the decoded audio and video according to specific requirements.

4. Frame rate control

Frame Rate (Frame Rate) refers to the number of frames played per second, usually expressed in FPS (Frames Per Second). The quality and fluency of the video have a lot to do with the frame rate. Generally speaking, the higher the frame rate, the smoother the video, but it will take up more storage space and computing resources.

[Learning address]: FFmpeg/WebRTC/RTMP/NDK/Android audio and video streaming media advanced development

[Article Benefits]: Receive more audio and video learning packages, Dachang interview questions, technical videos and learning roadmaps for free. The materials include (C/C++, Linux, FFmpeg webRTC rtmp hls rtsp ffplay srs, etc.) Click 1079654574 to join the group to receive it~

Here are some common frame rate control methods:

4.1 -rOptions

This is the simplest frame rate control method, use this option to specify the frame rate of the output video stream.

For example, the following command sets the frame rate of the input video to 25fps and writes it to the output file output.mp4:

ffmpeg -i input.mp4 -r 25 output.mp4

4.2 setptsFilters

Use setptsfilters to adjust the time interval between video frames.

For example, the following command increases the frame rate of the video to 60fps and outputs it to output.mp4:

ffmpeg -i input.mp4 -filter:v "setpts=0.5*PTS" -r 60 output.mp4

Here the video frame rate is increased by halving the display time of each video frame.

4.3 fpsFilters

Filters can be used fpsto extract specific frames from a video stream and generate a new video stream at a specified rate.

For example, the command below converts a raw video to a 10fps gif image:

ffmpeg -i input.mp4 -vf fps=10 output.gif

4.4 Combinations selectwith setptsfilters

Using a combination of filters selectand setptsfilters can capture video frames into pictures at specified time intervals and keep them at the same frame rate.

For example, the command below will capture an image every 10 seconds from the input video and save it as output_%03d.jpg:

ffmpeg -i input.mp4 -vf "select=not(mod(n\,300)),setpts=N/FRAME_RATE/TB" output_%03d.jpg

This mod(n\,300)means that only one frame out of every 300 frames is selected, setpts=N/FRAME_RATE/TBand the display time of each frame is recalculated according to the frame rate.

In general, FFmpeg provides a wealth of frame rate control functions, which can be used flexibly according to different scenarios, so as to achieve more refined video processing effects.

4.5 Realize frame rate control

In FFmpeg, we implement frame rate control by adjusting the PTS (Presentation Time Stamp, display time stamp) and DTS (Decoding Time Stamp, decoding time stamp) of the input video, for example:

4.5.1 Specify the frame rate of the output video

Using the FPS filter allows you to specify the frame rate of the output video, for example:

ffmpeg -i input.mp4 -c:v libx264 -filter:v fps=30 output.mp4

This command converts input.mp4 to H.264 encoding format and sets the output video frame rate to 30.

4.5.2 Modify the frame rate of the input video

If the frame rate of the original video is too high or too low, it needs to be modified to fit the current scene. Use the setpts filter to modify the PTS of the input video, thereby changing the frame rate, such as:

# 将25fps的视频转换为30fps
ffmpeg -i input.mp4 -filter:v "setpts=1.25*PTS" output.mp4
​
# 将60fps的视频转换为30fps
ffmpeg -i input.mp4 -filter:v "setpts=2.0*PTS" output.mp4

This command doubles or halves the display timestamp of each frame in input.mp4, so as to achieve the purpose of modifying the frame rate.

It should be noted that modifying the frame rate may cause video distortion or distortion, so it must be used with caution.

5. Decoder output format

In FFmpeg, a decoder is one of the components that decode audio and video data from compressed format to original format.

The output format of the FFmpeg decoder is related to the input format. Different input formats correspond to different decoders, and the output formats of different decoders are also slightly different. Generally speaking, the FFmpeg decoder can output the following commonly used video and audio formats:

5.1 Video format

5.1.1 YUV format

YUV is a color space and a common video format. A YUV image consists of three components: brightness (Y) and chrominance (U, V). Among them, Y represents brightness, and U and V represent chromaticity. During video encoding and processing, sub-formats such as YUV420P, YUV422P, and YUV444P are often used.

To convert video to YUV format, you can use the following command:

ffmpeg -i input.mp4 -pix_fmt yuv420p output.yuv

This command converts the input.mp4 file to YUV420P format and saves it in the output.yuv file.

5.1.2 RGB format

RGB is a common color space and a common video format. An RGB image consists of three basic colors of red, green, and blue. In video processing, sub-formats such as RGB24, RGBA, etc. are often used.

To convert video to RGB format, you can use the following command:

ffmpeg -i input.mp4 -pix_fmt rgb24 output.rgb

This command converts the input.mp4 file to RGB24 format and saves it in the output.rgb file.

5.1.3 H.264/H.265 format

H.264/H.265 is a common video compression format, which can greatly reduce the video file size while ensuring high definition. In FFmpeg, video can be encoded into H.264/H.265 format by specifying the encoder and parameters.

To convert video to H.264 format, you can use the following command:

ffmpeg -i input.mp4 -c:v libx264 output.mp4

This command converts the input.mp4 file to H.264 encoding format and saves it to the output.mp4 file.

To convert video to H.265 format, you can use the following command:

ffmpeg -i input.mp4 -c:v libx265 output.mp4

This command converts the input.mp4 file to H.265 encoding format and saves it to the output.mp4 file.

It should be noted that different video formats correspond to different characteristics such as output quality, file size, and decoding speed, which need to be selected and adjusted according to actual needs when using.

5.2 Audio format

5.2.1 PCM format

PCM is a lossless compression format with very good sound quality but large file size. In FFmpeg, audio files can be converted to PCM format and from PCM format to other formats.

To convert audio to PCM format, you can use the following command:

ffmpeg -i input.mp3 -acodec pcm_s16le output.wav

This command converts the input.mp3 file to PCM format and saves it to the output.wav file.

To convert PCM format to other formats, you can use the following command:

ffmpeg -f s16le -ar 44100 -ac 2 -i input.pcm -acodec libmp3lame output.mp3

This command converts the input.pcm file to MP3 format and saves it to the output.mp3 file.

5.2.2 MP3 format

MP3 is a lossy compression format, which can effectively reduce the audio file size, but will lose some sound quality. In FFmpeg, audio files can be converted to MP3 format, and from MP3 format to other formats.

To convert audio to MP3 format, you can use the following command:

将MP3格式转换为其他格式可以使用如下命令:

ffmpeg -i input.mp3 -acodec pcm_s16le output.wav

This command converts the input.wav file to MP3 format and saves it in the output.mp3 file.

To convert MP3 format to other formats, you can use the following command:

ffmpeg -i input.mp3 -acodec pcm_s16le output.wav

This command converts the input.mp3 file to PCM format and saves it to the output.wav file.

5.2.3 AAC format

AAC is a common lossy compressed audio format, which is more advanced than MP3 and can achieve higher sound quality compression ratio. In FFmpeg, audio files can be converted to AAC format, and from AAC format to other formats.

To convert audio to AAC format, you can use the following command:

ffmpeg -i input.wav -c:a libfdk_aac -b:a 128k output.aac

This command converts the input.wav file to AAC format and saves it to the output.aac file.

To convert AAC format to other formats, you can use the following command:

ffmpeg -i input.m4a -acodec pcm_s16le output.wav

This command converts the input.m4a file to PCM format and saves it to the output.wav file.

It should be noted that different audio formats correspond to different output quality, file size and decoding speed, etc., and need to be selected and adjusted according to actual needs when using.

6. Video encoding format

In FFmpeg, a video encoder is one of the components that compress raw video data (such as YUV or RGB) into various video encoding formats. Some commonly used video encoding formats are introduced below.

6.1 H.264/AVC

H.264 is a widely used video compression standard, also known as AVC (Advanced Video Coding). It is a lossy compression technique that can drastically reduce video file size while maintaining high quality. H.264 is widely used in Internet video transmission and storage, and is applicable to various platforms such as HTML5. At the same time, H.264 also supports a scalable framework, namely SVC (Scalable Video Coding), which can automatically adjust the definition of video according to the network bandwidth to provide the best viewing experience.

In FFmpeg, libx264an encoder can be used to encode raw video data into H.264 format video, for example:

ffmpeg -i input.mp4 -c:v libx264 -preset veryslow -crf 18 output.mp4

Here the output file is specified as H.264 format, and -c:v libx264options are used to select libx264the encoder. In addition, -preset veryslowit means to use the slowest encoding speed to achieve better video quality, -crf 18which is to specify the video quality (the smaller the value, the higher the quality).

6.2 H.265/HEVC

HEVC (High Efficiency Video Coding), also known as H.265, is a new video compression standard. Compared with H.264, H.265 has higher compression efficiency and lower bit rate, and can handle more High resolution and more complex video content. Since H.265 requires stronger computing power for encoding and decoding, it requires higher hardware requirements.

In FFmpeg, libx265an encoder can be used to encode raw video data into HEVC/H.265 format video, for example:

ffmpeg -i input.mp4 -c:v libx265 -preset veryslow -crf 18 output.mp4

This is similar to the H.264 example, but uses -c:v libx265the option to select libx265the encoder.

6.3 VP9

VP9 is a new video compression standard developed by Google. VP9 compression technology can compress the video file size to half or less of the original while ensuring high quality. VP9 is mainly used in WebM format video and is applicable to various platforms.

In FFmpeg, libvpx-vp9an encoder can be used to encode raw video data into VP9 format video, for example:

ffmpeg -i input.mp4 -c:v libvpx-vp9 -crf 32 -b:v 0 output.webm

Here the output file is specified in VP9 format, and -c:v libvpx-vp9options are used to select libvpx-vp9the encoder. In addition, -crf 32it indicates the video quality (the larger the value, the lower the quality), -b:v 0the specified bit rate is 0, so that it automatically adapts to the target quality.

6.4 AV1

AV1 is a new video compression standard developed by the Alliance for Open Media (AOMedia) consortium. Compared with H.265 and VP9, ​​AV1 has higher compression efficiency and lower bit rate, and is completely open source. AV1 is gradually being widely used. For example, large video sites such as YouTube and Netflix have begun to use AV1 for video encoding.

7. Summary

In conclusion, FFmpeg is a very powerful multimedia solution that can be used to process various audio and video formats. In the video decoding process, it is necessary to understand the video decoder, frame rate control, decoder output format, video encoding format and other related knowledge in order to correctly decode video data.

Guess you like

Origin blog.csdn.net/irainsa/article/details/130208349