Live 2 - FLV file structure analysis

FLV file structure analysis

FLV (FLASH VIDEO) is a commonly used file encapsulation format, which is currently adopted by most video sharing websites at home and abroad. Its standard is defined as "Adobe Flash Video File Format Specification". The RTMP protocol is also based on the FLV video format.


The file format of FLV has been clearly explained in this specification, and this chapter will not repeat the description, but will explain how to analyze the FLV file in combination with the following examples.


Figure 1. Example of FLV file structure 1


Figure 2. FLV file structure example 2


There are many analysis tools for FLV files. Here I recommend FLV Parser, a small software, through which you can easily see the structure of the file.


1.1 File structure

From the perspective of the entire file, FLV is composed of Header and File Body, as shown in the following figure:


Figure 3. The overall structure of the FLV file


1. FLV Header - The length is 9. For the standard definition of its structure, see E.2 The FLV header for the standard definition;    

2. FLV File Body - Consists of a series of PreviousTagSize + Tag. previousTagSize is 4 bytes of data, indicating the size of the previous tag. See E.3 The FLV File Body for standard definitions.


Take Figure 1. FLV file structure example 1 as an example to analyze the overall structure:


1. Position 0x00000000 - 0x00000008, a total of 9 bytes, is the FLV Header, of which:

 •  0x00000000 - 0x00000002 : 0x46 0x4C 0x56 respectively represent the characters 'F''L'''V', used to identify the file is in FLV format. When doing format detection, if the first 3 bytes are found to be "FLV", it is considered to be an FLV file;

 •  0x00000003 : 0x01, indicating the FLV version number;

 •  0x00000004 : 0x05, converted to binary, it is 00000101, where the 0th bit is 1, indicating that there is video, and the second bit is 1, indicating that there is audio;

 •  0x00000005 - 0x00000008 : 0x00 0x00 0x00 0x09, converted to 9 in decimal, indicating the length of the FLV header, when the FLV version number is 1, the value is usually 9.


2.   Position 0x00000009 - , which is FLV File Body:

 •  0x00000009 - 0x0000000C : 0x00 0x00 0x00 0x00 PreviousTagSize0, converted to 0 in decimal, this value is always 0;

 •  0x0000000D - 0x00000209 : 0x12 ... 0x09, a total of 509 bytes, is the specific content of Tag1;

 •  0x0000020A - 0x0000020D : 0x00 0x00 0x01 0xFD, converted to decimal to 509, indicating the Tag in front of it, that is, the length of Tag1 is 509;

 •  0x0000020E - : It is recursive according to the structure of Tag + PreviousTagSize, and no example is given here.



1.2 Tag definition

FLV File Body is composed of a series of PreviousTagSize + Tag, in which the length of PreviousTagSize is 4 bytes, which is used to indicate the length of the previous Tag; the data in the Tag may be video, audio or scripts, and its definition is shown in E.4.1 FLV Tag, the structure is as follows:


Figure 4. FLV Tag structure


Take Figure 1. FLV file structure example 1 as an example to analyze the Tag structure:


1. Position 0x0000020E : 0x08, binary is 0000 1000, the 5th bit is 0, indicating that it is a non-scrambled file; the lower 5 bits 01000 is 8, indicating that the data type contained in this Tag is Audio;

2. Position 0x0000020F - 0x00000211 : 0x00 0x00 0x04, converted to decimal to 4, indicating that the content length of the tag is 4, which is the same as the previousTagSize(15) - 11 after the tag;

3. Position 0x00000212 - 0x00000214 : 0x00 0x00 0x00, converted to decimal to 0, indicating that the timestamp of the current Audio data is 0;

4. Position 0x00000215 : 0x00, the extended timestamp is 0, if the extended timestamp is not 0, then the timestamp of the Tag should be: Timestamp | TimestampExtended<<24;

5. Position 0x00000216 - 0x00000218 : 0x00 0x00 0x00, StreamID, always 0;

6. The data after StreamID is different for each format, and will be explained in detail below.


1.3 Audio Tags

If the TagType in the TAG packet is equal to 8, it means that the data type contained in the Tag is Audio. The data after StreamID is AudioTagHeader, whose definition is detailed in E.4.2.1 AUDIODATA. The structure is as follows:


Figure 5. FLV Audio Tag structure


It should be noted that the AudioTagHeader is usually followed by the AUDIODATA data, but there is a special case. If the audio encoding format is AAC, there will be one more byte of data in the AudioTagHeader. AACPacketType, this field represents the type of AACAUDIODATA:

•  0 = AAC sequence header

•  1 = AAC raw。


Take Figure 1. FLV file structure example as an example to analyze the AudioTag structure:

1. Position 0x00000219 : 0xAF, which is 1010 1111 in binary:

    The upper 4 bits are 1010, and the decimal is 10, which means that the encoding format of Audio is AAC;

    The 3rd and 2nd bits are 11, and the decimal is 3, indicating that the sampling rate of the audio is 44KHZ;

    The first bit is 1, indicating that the bit width of the audio sampling point is 16bits;

    Bit 0 is 1, indicating that the audio is stereo.

2. Position 0x0000021A: 0x00, decimal is 0, and the encoding format of Audio is AAC, indicating that the AAC sequence header is stored in AACAUDIODATA;

3. Position 0x0000021B - 0x0000021C : AUDIODATA data, ie AAC sequence header.


1.3.1 AudioSpecificConfig

The AAC sequence header stores AudioSpecificConfig, which contains more detailed audio information, which is defined in detail in Chapter 1.6.2.1 of "ISO-14496-3 Audio".

Normally, the AAC sequence header tag appears only once in the FLV file, and is the first Audio Tag, which stores the detailed information needed to decode AAC audio.

For code analysis of the AudioSpecificConfig structure, you can refer to the avpriv_mpeg4audio_get_config method in ffmpeg/libavcodec/mpeg4audio.c.

Why audio-related parameters are defined in AudioTagHeader, and we also need to pass AudioSpecificConfig?


Because when the SoundFormat is AAC, the SoundType must be set to 1 (stereo) and the SoundRate must be set to 3 (44KHZ), but this does not mean that the AAC encoded audio in the FLV file must be 44KHZ stereo. When playing AAC audio, the player should ignore the parameters in the AudioTagHeader and configure the correct decoding parameters according to AudioSpecificConfig.


1.4 video day

If the TagType in the TAG packet is equal to 9, it means that the data type contained in the Tag is Video. The data after StreamID is VideoTagHeader, its definition is detailed in E.4.3.1 VIDEODATA, the structure is as follows:


Figure 6. FLV Video Tag structure


VideoTagHeader is followed by VIDEODATA data, but like AAC audio, it also has a special case, that is, when the video encoding format is H.264, VideoTagHeader will add 4 bytes of information, AVCPacketType and CompositionTime.


• AVCPacketType is used to represent the content of VIDEODATA

• CompositonTime relative timestamp, if AVCPacketType=0x01, it is a relative timestamp, all others are 0;


Take Figure 2. FLV file structure example 2 as an example to analyze the VideoTagHeader structure:


1. Position 0x0000022C : 0x17, 0001 0111 in binary:

•   The upper 4 bits are 0001, and the decimal is 1, indicating that the current frame is a key frame;

•   The lower 4 bits are 0111, and the decimal is 7, indicating that the encoding format of the current video is AVC.


2. Position 0x0000022D : 0x00, decimal is 0, and the encoding format of Video is AVC, indicating that the AVC sequence header is stored in VideoTagBody;


3. Position 0x0000022E - 0x00000230 : Convert decimal to 0, indicating that the relative timestamp is 0;


4. Position 0x00000231 - 0x0000021C : VIDEODATA data, ie AVC sequence header.


1.4.1   AVCDecoderConfigurationRecord

The AVC sequence header stores the AVCDecoderConfigurationRecord, which is defined in detail in "ISO-14496-15 AVC file format". It stores the encoding parameters of AVC, which needs to be set to the decoder before decoding can be performed correctly.


Normally, the AVC sequence header tag appears only once in the FLV file and is the first Video Tag.


For code analysis of the AVCDecoderConfigurationRecord structure, you can refer to the ff_isom_write_avcc method in .


1.4.2 CompositionTime (relative timestamp)

The concept of relative timestamp needs to be understood together with PTS and DTS:

• DTS: Decode Time Stamp, decoding timestamp, used to inform the decoder of the decoding time of the video frame;

• PTS : Presentation Time Stamp, showing the timestamp, used to inform the player of the display time of the video frame;

• CTS : Composition Time Stamp, relative timestamp, used to represent the difference between PTS and DTS.


If the encoding of each frame in the video is performed sequentially in the order of input, the decoding and display times are the same and should be consistent. However, in the encoded video type, if there are B frames, the input order and the encoding order are not consistent, so the two timestamps, PTS and DTS, are needed. The decoding of the video frame must occur before the display, so the PTS of the video frame must be greater than or equal to the DTS, so CTS=PTS-DTS.


The TimeStamp in FLV Video Tag is not PTS, but DTS. The PTS of the video frame needs to be calculated by DTS + CTS.


Why doesn't Audio Tag need CompositionTime?

Because the encoding order of Audio is consistent with the input order, that is, PTS=DTS, it has no concept of CompositionTime.


1.5 Script Data Tags

If the TagType in the TAG packet is equal to 18, it means that the data type contained in the Tag is SCRIPT.


The structure of SCRIPTDATA is very complex. Many format types are defined. Each type corresponds to a structure. For details, please refer to E.4.4 Data Tags


onMetaData is a very important information in SCRIPTDATA, its structure definition can refer to E.5 onMetaData. It is usually the first Tag in the FLV file, which is used to represent some basic information of the current file: such as the encoding type id of the video and audio, the width and height of the video, the file size, the video length, the creation date, etc.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325954289&siteId=291194637