ffmpeg encoding and decoding - packet concept (how to correctly handle the relationship between the display timestamp pts and decoding timestamp dts in the data packet?)

FFmpeg encoding and decoding - packet (Packet) concept

FFmpeg is a completely open source audio and video encoding and decoding library. It not only contains numerous audio and video encoding and decoding algorithms, but also provides tools for audio and video processing. This article will mainly introduce the related concepts and applications of packets in FFmpeg.

1. Introduction to Packet

In FFmpeg, packet (Packet) is the basic unit for storing compressed and encoded data. A packet can contain data from one or more encoded frames (there are also cases where multiple packets contain different fragments of an encoded frame). In audio coding, usually a data packet only contains one frame of data; but in video coding, due to the existence of B frames and P frames, a data packet may contain multiple frames of data.

typedef struct AVPacket {
    
    
    AVBufferRef *buf;  
    int64_t pts;
    int64_t dts;
    uint8_t *data;
    int   size;
    int   stream_index;
    int   flags;
    AVPacketSideData *side_data;
    int side_data_elems;
} AVPacket;

AVPacketIt is the data packet structure defined in FFmpeg. Its main fields include:

  • buf: A reference to packet memory.
  • pts and dts: represent display timestamp and decoding timestamp respectively.
  • data and size: point to the data and size of the packet.
  • stream_index: Which flow this packet belongs to.
  • flags: Flag bit, such as key frame, etc.
  • side_data and side_data_elems: Store additional data and number of elements.

2. Application of Packet in FFmpeg

Data packets play a vital role in the FFmpeg encoding and decoding process. The following are its main applications:

2.1 Reading packets from media files

When using FFmpeg to read data from a media file, we need to open the file first, and then call the av_read_frame() function in a loop to read the data packet. Here is the relevant code:

AVFormatContext *pFormatCtx = avformat_alloc_context();
if(avformat_open_input(&pFormatCtx, filepath, NULL, NULL) != 0){
    
    
    printf("Couldn't open input stream.\n");
    return -1;
}

AVPacket packet;
while(av_read_frame(pFormatCtx, &packet)>=0){
    
    
    // do something with packet
}

2.2 Write data packets to media files

Writing data to media files is also implemented through data packets. The specific operation is to create a data packet, then fill the encoded data into the data packet, and finally call the av_interleaved_write_frame() or av_write_frame() function to write the data packet into the media file.

AVFormatContext *pFormatCtx = NULL;
avformat_alloc_output_context2(&pFormatCtx, NULL, NULL, outfile);
// ...

AVPacket pkt;
av_new_packet(&pkt,data_size);
memcpy(pkt.data,framebuf,data_size);
pkt.stream_index = video_st->index;
ret = av_interleaved_write_frame(pFormatCtx, &pkt);

3. Packet related problems and solutions

In actual use, you may encounter some problems with data packets. Here are some common problems and their solutions:

3.1 Packet memory management

FFmpeg will automatically allocate and release memory when processing data packets. In order to prevent memory leaks, we need to call the av_packet_unref() function each time after processing a data packet to release the memory occupied by the data packet.

AVPacket pkt;
while(av_read_frame(pFormatCtx, &pkt)>=0){
    
    
    // do something with packet
    av_packet_unref(&pkt);
}

3.2 Timestamp processing

When dealing with issues such as audio and video synchronization, it is necessary to correctly handle the pts and dts timestamps in the data packet. FFmpeg provides theav_packet_rescale_ts() function, which can be used to convert the timestamp in the data packet from one time base to another.

AVPacket pkt;
// ...
av_packet_rescale_ts(&pkt, in_time_base, out_time_base);

4. How to correctly handle the relationship between pts (Presentation Time Stamp) and dts (Decoding Time Stamp) in the data packet?

When using FFmpeg for audio and video encoding and decoding, we will encounter two important concepts: PTS (Presentation Time Stamp) and DTS (Decoding Time Stamp). Both are timestamps but serve different purposes. Understanding and handling them correctly is crucial to achieve smooth playback and accurate audio and video synchronization.

1. Introduction to PTS and DTS

1.1 PTS (Presentation Time Stamp)

PTS refers to "presentation timestamp" and indicates when the frame should be displayed. That is, when the media player reads a packet with a PTS, it waits until the time specified by the PTS before displaying the frame.

1.2 DTS (Decoding Time Stamp)

DTS refers to "decoding timestamp" and indicates when decoding of this frame should begin. Since B-frames may depend on subsequent frames, subsequent frames need to be decoded first, so the DTS may be slightly larger than the original (it cannot be decoded until its reference frame is decoded).

Insert image description here

20231210: After my preliminary observation,ffprobe -show_packets xxx shows that the order of packets is the decoding order, not the display order, and the display order is chaotic. To see the display order, just look at the pts of each packet. The order from small to large is the display order.

Insert image description here

2. The relationship between PTS and DTS

Without B-frames, the PTS and DTS of each frame are the same because the decoding order and display order are the same. However, if B-frames are present, the decoding order and display order may be different, and therefore the PTS and DTS may also be different.

For B-frames, the PTS is usually greater than the PTS of the previous frame, but the DTS may be smaller than the DTS of the previous frame. This is because B-frames rely on the frames that follow them to be decoded, so they need to be decoded first.

As shown in the figure, the decoding timestamp (0) of the second B frame is earlier than the decoding timestamp (1001) of the first B frame:

Insert image description here

3. How to deal with PTS and DTS

When reading packets from a file, we need to ensure that the PTS and DTS are handled correctly. Below is an example:

AVPacket packet;
while (av_read_frame(format_context, &packet) >= 0) {
    
    
    // Convert the timestamps from the packet's time_base to the stream's time_base.
    packet.pts = av_rescale_q(packet.pts, format_context->streams[packet.stream_index]->time_base, stream->time_base);
    packet.dts = av_rescale_q(packet.dts, format_context->streams[packet.stream_index]->time_base, stream->time_base);

    // Do something with the packet...
}

In this example, theav_rescale_q() function is used to convert a timestamp from one time base to another. This is necessary because different streams may have different time bases.

In addition, when writing packets to files, you also need to ensure that PTS and DTS are set correctly. Otherwise, the player may not play the resulting file correctly. Below is an example:

AVPacket packet;
// Fill the packet...

// Set the PTS and DTS.
packet.pts = next_pts++;
packet.dts = next_dts++;

// Write the packet.
if (av_interleaved_write_frame(format_context, &packet) < 0) {
    
    
    // Handle the error...
}

In this example, the next_pts and next_dts variables are used to store the next PTS and DTS. Increment them by 1 every time a packet is written.

In general, correctly processing PTS and DTS is a very important step in audio and video encoding and decoding, which can ensure that the result file we get can be played correctly.

5. Related questions

Why is the PTS and DTS of the first I-frame of my video different? The PTS is 0 and the DTS is -2002?

Insert image description here

explain:

In some cases, the PTS (Presentation Time Stamp, display timestamp) and DTS (Decoding Time Stamp, decoding timestamp) of the first frame of the video (I frame) may be different. This is mainly due to the time base setting in video encoding.

Normally, we expect that the PTS and DTS of the first I-frame of the video are both 0 because it is the starting point of video playback. However, in some video streams, it may happen that the DTS of the I frame is smaller than the PTS, mainly to accommodate the subsequent B frames that may appear. As in the example you mentioned, the DTS is -2002, which means the decoder needs to start decoding early before actual playback.

However, this situation will not affect normal viewing during playback, because the playback device will display the correct frame according to the PTS. Of course, if you need to do further processing on the video (such as editing or transcoding, etc.), then you may need to adjust the timestamps to ensure that they start from 0.

Guess you like

Origin blog.csdn.net/Dontla/article/details/134901321
Recommended