Function call knowledge points of ffmpeg

What does av_dump_format print mean?

av_dump_format will print out the content of AVFormatContext, what does the printed content mean?

We use av_dump_format to print out the following information:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'https://demo.com/BigBuckBunny.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isomavc1mp42
    creation_time   : 2010-01-10T08:29:06.000000Z
  Duration: 00:09:56.47, start: 0.000000, bitrate: 2119 kb/s
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
    Metadata:
      creation_time   : 2010-01-10T08:29:06.000000Z
      handler_name    : (C) 2007 Google Inc. v08.13.2007.
    Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1991 kb/s, 24 fps, 24 tbr, 24k tbn, 48 tbc (default)
    Metadata:
      creation_time   : 2010-01-10T08:29:06.000000Z
      handler_name    : (C) 2007 Google Inc. v08.13.2007.

What is the container format for files?

"mov,mp4,m4a,3gp,3g2,mj2" indicates the extension supported by the container format (Container Format) of the input file.

In multimedia files, container formats are used to organize and store multiple audio and video streams and other related data. Different container formats support different file extensions. These extensions are used to indicate the format and type of the file.

In the example output, "mov,mp4,m4a,3gp,3g2,mj2" indicates that the input file is a container format that supports these extensions. The specific meaning is as follows:

  • "mov": QuickTime Movie file, usually used on the macOS platform.

  • "mp4": MPEG-4 Part 14 file, which is a common multimedia container format widely used to store audio and video data.

  • "m4a": MPEG-4 Audio file for storing audio data.

  • "3gp": 3rd Generation Partnership Project files for audio and video playback on mobile devices.

  • "3g2": 3rd Generation Partnership Project 2 file, an enhanced version of 3GP.

  • "mj2": Motion JPEG 2000 file for storing video based on JPEG 2000 compression.

Therefore, "mov,mp4,m4a,3gp,3g2,mj2" indicates that the input file is a file that supports these container formats, and the type and format of the file can be inferred from the file extension.

The multimedia file format is relatively general. To be precise, it should be divided into two formats, encoding format + container format.

This is equivalent to that we can use plates to serve vegetables, and we can also use plates to serve rice.

We can use different container formats to hold different encoding formats. (Of course, there are also some regulations here. There are clear requirements for which encoding formats are allowed to be contained in a certain container format)

This container format is also the file suffix we see every day.

In the output here, we have multiple container formats, indicating that we allow the content downloaded by this network url to be stored in file formats such as mov/mp4/m4a.

What information does Metadata contain?

In av_dump_format()the output information, the "Metadata" section contains some metadata information, which is used to describe the specific attributes and related data of the media file. Below is an explanation of each field:

  • major_brand: Indicates the primary branding of the file. It indicates "best" based on which format to parse the current file. In the example, "mp42" means that the primary brand of the file is "mp42". This is an identifier that identifies the container format or encoder being used.

  • minor_version: Indicates the minor version number of the file. In the example, "0" means that the file has a minor version of 0.

  • compatible_brands: Indicates which compatible brand identifiers the file is compatible with. In the example, "isomavc1mp42" means that the file is compatible with the "isom", "avc1" and "mp42" brand identifiers.

  • creation_time: Indicates the creation time of the file. In the example, "2010-01-10T08:29:06.000000Z" means that the file was created on January 10, 2010 08:29:06 UTC.

This metadata information provides additional information about the media file, such as the file's branding, version number, compatibility, and creation time. This information can be used to determine file attributes, compatibility, and production information.

What information does Duration have?

Duration: 00:09:56.47, start: 0.000000, bitrate: 2119 kb/s

This information is relatively easy to understand. The video file has 9 minutes and 56 seconds, and the playback bit rate is 2119kb/s

The bit rate (bitrate) of a video refers to the number of bits transmitted or processed per unit time, usually in units of bits per second (bps, bits per second). It indicates the transfer rate or processing speed of video data.

The bit rate directly affects the data volume and quality of the video, a higher bit rate means more data is allocated to each time unit, thus providing higher video quality and finer details. A lower bit rate means that the video data is compressed more, which can reduce file size or reduce transmission bandwidth requirements, but may result in loss of video quality and loss of detail.

What information does the audio stream have?

Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
    Metadata:
      creation_time   : 2010-01-10T08:29:06.000000Z
      handler_name    : (C) 2007 Google Inc. v08.13.2007.

"Stream #0:0(und)" indicates the information of the first stream (Stream). Below is an explanation of each field:

  • Audio: Indicates that the stream is an audio stream.

  • aac (LC): Indicates that the audio codec is the Low Complexity (Low Complexity) mode of AAC (Advanced Audio Coding).

  • mp4a / 0x6134706D: The identifier of the audio encoder is "mp4a", and the corresponding hexadecimal value is "0x6134706D".

  • 44100 Hz: Indicates that the audio sampling rate is 44,100 Hz, that is, the number of audio samples collected and played per second.

  • stereo: Indicates that the audio channel mode is stereo, that is, left and right channels.

  • fltp: Indicates that the audio sampling format is floating point.

  • 125 kb/s: Indicates that the audio bit rate is 125 kb/s, that is, the amount of audio data transmitted or processed per second is 125 kb.

  • (default): Indicates that the stream is the default stream.

  • handler_name: Indicates that it was generated or processed by a version of the handler developed by Google Inc.

What information does the video stream have?

"Stream #0:1(und)" indicates the information of the second stream (Stream), that is, the video stream. Below is an explanation of each field:

  • Video: Indicates that the stream is a video stream.

  • h264 (High): Indicates that the video codec is H.264 (Advanced).

  • avc1 / 0x31637661: The identifier of the video encoder is "avc1", and the corresponding hexadecimal value is "0x31637661".

  • yuv420p: Indicates that the video sampling format is YUV420P, that is, the color space is YUV, and the sampling ratio of luma and chrominance components is 4:2:0.

  • 1280x720: Indicates that the video resolution is 1280x720 pixels.

  • [SAR 1:1 DAR 16:9]: Indicates that the sample aspect ratio (Sample Aspect Ratio) of the video is 1:1, and the display aspect ratio (Display Aspect Ratio) is 16:9.

  • 1991 kb/s: Indicates that the bit rate of the video is 1991 kb/s, that is, the amount of video data transmitted or processed per second is 1991 kb.

  • 24 fps: Indicates that the frame rate of the video is 24 frames per second.

  • 24 tbr, 24k tbn, 48 tbc: Indicates the time base information of the video.

  • (default): Indicates that the stream is the default stream.

  • handler_name: Indicates that it was generated or processed by a version of the handler developed by Google Inc.

What are the elements of the AVStream structure?

The AVStream structure is a data structure representing the media stream in FFmpeg, which contains various attributes and information of the media stream. The following are some commonly used member variables in the AVStream structure:

  • index: Indicates the index number of the stream.

  • id: Indicates the unique identifier of the stream.

  • codecpar: A pointer to the AVCodecParameters structure, which contains the codec parameters associated with the stream.

  • time_base: Indicates the time base of the stream, which is used to convert time units to real time.

  • start_time: Indicates the start time of the stream.

  • duration: Indicates the duration of the stream.

  • nb_frames: Indicates the number of frames in the stream.

  • disposition: Indicates a layout- or position-dependent flag for a stream.

  • avg_frame_rate: Indicates the average frame rate of the stream.

  • r_frame_rate: Indicates the reference frame rate of the stream.

  • metadata: A pointer to the AVDictionary structure, which contains the metadata of the stream.

streamsTo use the AVStream structure, you can first obtain the specific AVStream structure through the array in the AVFormatContext structure , and then use the corresponding member variables to obtain the required information. For example, to get the index number of a stream you can use avStream->index, to get the time base of a stream you can use avStream->time_base, and to get the metadata of a stream you can use avStream->metadata.

How does ffmpeg read video stream?

ffmpeg needs to locate the video stream in mp4 first, read each frame from the video stream stream, and convert each frame to yuv format.

// 解码器
    AVCodec* codec = nullptr;
    AVCodecContext* codecContext = avcodec_alloc_context3(codec);

  // 寻找到视频流
    int videoStreamIndex = av_find_best_stream(inputContext, AVMEDIA_TYPE_VIDEO, -1, -1, &codec, 0);
    if (videoStreamIndex < 0) {
        // 没有找到视频流
        return -1;
    }

    // 获取视频流
    AVStream* stream = inputContext->streams[videoStreamIndex];
           
    if (avcodec_parameters_to_context(codecContext, stream->codecpar) < 0) {
        // 获取解码器上下文失败
        return -1;
    }
    if (avcodec_open2(codecContext, codec, nullptr) < 0) {
        // 打开解码器失败
        return -1;
    }

    // 分配视频帧和 YUV 帧
    AVFrame* frame = av_frame_alloc();
    AVFrame* frameYUV = av_frame_alloc();
    int frameBufferSize = av_image_get_buffer_size(AV_PIX_FMT_YUV420P, codecContext->width, codecContext->height, 1);
    uint8_t* frameBuffer = (uint8_t*)av_malloc(frameBufferSize * sizeof(uint8_t));
    av_image_fill_arrays(frameYUV->data, frameYUV->linesize, frameBuffer, AV_PIX_FMT_YUV420P, codecContext->width, codecContext->height, 1);

    // 初始化图像转换上下文
    struct SwsContext* swsContext = sws_getContext(codecContext->width, codecContext->height, codecContext->pix_fmt, codecContext->width, codecContext->height, AV_PIX_FMT_YUV420P, SWS_BILINEAR, nullptr, nullptr, nullptr);
    // 读取视频帧并转换为 YUV 格式
    AVPacket packet;
    while (av_read_frame(inputContext, &packet) >= 0) {
        if (packet.stream_index == videoStreamIndex) {
            // 解码视频帧
            avcodec_send_packet(codecContext, &packet);
            avcodec_receive_frame(codecContext, frame);

            // 转换为 YUV 格式
            sws_scale(swsContext, frame->data, frame->linesize, 0, codecContext->height, frameYUV->data, frameYUV->linesize);

            // 在这里可以对 YUV 数据进行处理

            // 释放帧的引用
            av_frame_unref(frame);
        }

        av_packet_unref(&packet);
    }

How does ffmpeg locate the second of a certain stream to start playing?

Use the av_seek_frame function: You can use the av_seek_frame function to locate at a specific point in time. This function can be used for audio and video streams. The following is a sample code snippet showing how to use av_seek_frame to seek to a specified point in time:

int64_t timestamp = desired_time * AV_TIME_BASE;  // 将秒转换为时间戳
int stream_index = 0;  // 假设我们要定位到第一个流

AVStream* stream = formatContext->streams[stream_index];
int64_t seek_target = av_rescale_q(timestamp, AV_TIME_BASE_Q, stream->time_base);

// 定位到指定时间点
av_seek_frame(formatContext, stream_index, seek_target, AVSEEK_FLAG_BACKWARD);

reference

5-minute introduction to the MP4 file format

 

Guess you like

Origin blog.csdn.net/qq_42015552/article/details/131745978