FLV format analysis

FLV format analysis


table of Contents

  1. FVL overview
  2. Frame analysis diagram
  3. FLV Header
  4. FLV Body

Blog recommendation (I feel that it is enough to read this directly, mine is a supplement): flv format detailed explanation + example analysis


1. FVL overview

  1. FLV (Flash Video) is a streaming media format launched by Adobe. Due to the small size and simple packaging of the packaged audio and video files, it is very suitable for use on the Internet. Most mainstream video sites in the past support FLV. The suffix of files encapsulated in FLV format is .flv.
  2. The FLV package format is composed of a file header and a file body.
  3. The FLV body consists of a pair of (Previous Tag Size field + tag). The Previous Tag Size field is arranged before the Tag and occupies 4 bytes. Previous Tag Size records the size of the previous tag, which is used for reverse reading processing. The value of the first Pervious Tag Size after the FLV header is 0.
  4. Tag can generally be divided into 3 types: script (frame) data type, audio data type, and video data type. FLV data is stored in large-endian order, which requires attention when parsing.
  5. The structure of a standard FLV file is as follows:
    Insert picture description here
  6. The detailed content structure of the FLV file is as follows:
    Insert picture description here

2. Frame analysis diagram

Insert picture description here


3. FLV Header

  1. Note: In the data type below, UI means unsigned integer shaping, and the number following it means how many bits it is. Like UI8, it means unsigned shaping, and the length is one byte. UI24 is three bytes, and UI[8*n] represents multiple bytes. UB represents a bit field, and UB5 represents 5 bits of a byte. You can refer to the bit field structure in c.
  2. The FLV header occupies 9 bytes, which is used to identify the file as the FLV type and the subsequent stored audio and video stream. An FLV file, each type of tag belongs to a stream, that is, an FLV file can only have at most one audio stream, one video stream, there is no multiple independent audio and video streams in one The condition of the file.

1. The structure of the FLV header is as follows:

Insert picture description here


4. FLV Body

  1. After FLV Header, FLV File Body. FLV File Body is composed of a series of back-pointers + tags. Back-pointer represents the Previous Tag Size (the byte data length of the previous tag), which occupies 4 bytes.
    Insert picture description here

1. FLV Tag

  1. Each tag is also composed of two parts: tag header and tag data. Tag Header stores information such as the current tag type and the length of the tag data.
1. tag header
  1. The tag header generally occupies 11 bytes of memory space. The FLV tag structure is as follows:
    Insert picture description here
  2. note:
  3. In the flv file, Timestamp and TimestampExtended spell out dts. That is, the decoding time. Timestamp and TimestampExtended spell out dts in ms. (If there is no B frame, of course dts is equal to pts)
  4. CompositionTime represents the offset value of PTS relative to DTS, in the 14th ~16th byte of each video tag . Display time (pts) = decoding time (the 5th to 8th bytes of tag) + CompositionTime.
  5. The unit of CompositionTime is also ms
  6. Script data is the data describing the information of the video or audio, such as width, height, time, etc. There is usually only one metadata in a file. Audio tag and video tag are audio and video information, sampling , Channel, frequency, coding and other information.
2. Script Tag Data structure (script type, frame type)
  1. This type of tag is called MetaData Tag, which stores some meta-information about FLV video and audio, such as duration, width, height, etc. Usually this type of tag will be used as the first tag of the FLV file, and there is only one tag, following the File Header. The structure of this type of Tag Data is as follows:
    Insert picture description here
  2. The first AMF packet: The first byte represents the AMF packet type, usually 0x02, which represents a character string. The first 2-3 bytes are the UI16 type value, which identifies the length of the string, usually 0x000A ("onMetaData" length). The last byte is a specific string, usually "onMetaData" (6F, 6E, 4D, 65, 74, 61, 44, 61, 74, 61).
  3. The second AMF packet: The first byte represents the AMF packet type, usually 0x08, which represents an array. The first 2-5 bytes are UI32 type values, indicating the number of array elements. The following is the encapsulation of each array element, which is a pair of element name and value. The usual array elements are shown in the following table.
    Insert picture description here
3. Audio Tag Data structure (audio type)
  1. Starting from the Audio Tag Data area:
    1. The first byte contains the parameter information of the audio data,
    2. The second byte starts with audio stream data.
  2. (These two bytes belong to the data part of the tag, not the header part)
  3. The first byte is the audio information (look at the spec carefully and find that for AAC files, the more useful field is SoundFormat), the format is as follows:
    Insert picture description here
  4. The second byte starts with audio data (it needs to be judged whether the data is real audio data or audio config information)
    Insert picture description here
    Insert picture description here
  5. If it is AAC data, if it is AAC RAW, tag data[3] is the real AAC frame data from the beginning.
    Insert picture description here
4. Video Tag Data structure (video type)
  1. The beginning of the video Tag Data:
    1. The first byte contains the parameter information of the video data.
    2. The second byte starts with video stream data.
  2. The first byte contains video information in the following format:
    Insert picture description here
  3. Video data starts at the second byte
    4.
    Insert picture description here
  4. CompositionTime in milliseconds
    1. CompositionTime The 14th to 16th bytes of each video tag (the entire tag) (if it is tag data offset [3] ~[5], [0], [1][2:AVCPackettype]) (indicating the PTS relative to DTS Offset value).
    2. The unit of CompositionTime is ms: display time = decoding time (the 5th to 8th bytes of tag, position index [4] ~[7]) + CompositionTime
      Insert picture description here
      f

Guess you like

Origin blog.csdn.net/weixin_41910694/article/details/109564752