Detailed flv format + example analysis

Introduction

FLV (Flash Video) is a very popular streaming media format. Due to its light size and simple packaging and playback, it is very suitable for applications on the Internet. At present, mainstream video websites use the FLV format without exception. . In addition, due to the close integration of current browsers and Flash Player, it is easy to play FLV videos on web pages, which is also one of the reasons why FLV is popular.

FLV is a streaming media encapsulation format, and we can regard its data as a binary byte stream. Generally speaking, FLV includes two parts: File Header and File Body. The file body is composed of a series of Tag and Tag Size pairs.

 

 

flv.jpg

FLV format analysis

Let's take a picture first, this is a MV video of "Dong Feng Bro"-Jay Chou ( download ). I am using Binary Viewer's binary viewing tool.

 

1.png

 

header

The header part consists of the following parts
Signature(3 Byte)+Version(1 Byte)+Flags(1 Bypte)+DataOffset(4 Byte)

  • signature occupies 3 bytes,
    fixed FLV three characters as a mark. Generally, when the first three characters are found to be FLV, it is considered to be a flv file.
  • Version occupies 1 byte to
    indicate the version number of FLV. Here we see it is 1
  • Flags occupies 1 byte of
    content flags. The 0th and 2nd bits indicate the existence of video and audio respectively. (1 means existence, 0 means absent). The screenshot shows 0x05that it is 00000101, that is , it means that there is both video and audio.
  • DataOffset 4 bytes
    represents the FLV header length. Here you can see that the fixed is 9

body

The body part of FLV is composed of a series of back-pointers + tags

  • The back-pointers are fixed at 4 bytes, representing the size of the previous tag.
  • There are three types of tags, video, audio, and scripts.

tag composition

tag type+tag data size+Timestamp+TimestampExtended+stream idtag data

  • type 1 byte. 8 is Audio, 9 is Video, 18 is scripts
  • tag data size 3 bytes. Indicates the length of tag data. Counted from streamd id.
  • Timestreamp 3 bytes. Timestamp
  • TimestampExtended 1 byte. Timestamp extension field
  • The stream id is 3 bytes. Always 0
  • tag data data part

Let's analyze it based on an example:
see the first TAG
type= 0x12=18. This should be a script.
size 0x000125==293. The length is 293.
timestreamp= 0x000000. Here are scripts, so it is 0
TimestampExtended = 0x00.
stream id = 0x000000
Let's look at the data part of TAG:

 

3.png

 

Tag division

The red part in the picture is the two back-pointers I marked, both of which are 4 bytes. And in the middle is the first TAG. How is that calculated? Let's take this as an example.

  • First of all, the first back-pointers is 0x00000000, that is because the back is the first TAG. So he is 0.
  • Then we get the size according to our previous format 0x000125. In other words, add 293 bytes from the back of the stream id to the end of the first TAG, let's count it. The stream id used to have a total of 24 bytes. So by the end of the first TAG, there are a total of 293+24=137= 0x13D.
  • The 0x13Daddress we found next is easy to find from the tool, just before the red underline. The red part is 0x00000130=304, which represents the size of the previous TAG.
  • Finally, we calculate that the last TAG data part is 293 bytes, and the previous type, stream id and other fields account for 11 bytes. It happens to match.

Above we already know how to divide each TAG. Next we will look at the specific content of TAG

The content of the tag

There are 3 types of tags mentioned earlier. Let's see one by one

script

There is generally only one script tag, which is the first tag of flv, which is used to store flv information, such as duration, audiodatarate, creator, width, etc.
First introduce the data type of the script. All data appears in the format of data type + (data length) + data. The data type occupies 1 byte. The data length depends on whether the data type exists, and the data is behind.
Generally speaking, the Tag Data structure contains two AMF packets. AMF (Action Message Format) is a universal data encapsulation format designed by Adobe. It is used in many Adobe products. Simply put, AMF uses a unified format to describe different types of data. The first AMF package encapsulates string type data, which is used to load an "onMetaData" logo, which is related to some of Adobe's API calls, and will not be described in detail here. The second AMF package encapsulates an array type, which contains the names and values ​​of audio and video information items. The specific instructions are as follows, you can refer to the data on the picture for understanding.

value Types of Description
0 Number type 8 Bypte Double
1 Boolean type 1 Bypte bool
2 String type The next 2 bytes are the length
3 Object type  
4 MovieClip type  
5 Null type  
6 Undefined type  
7 Reference type  
8 ECMA array type Array, similar to Map
10 Strict array type  
11 Date type  
12 Long string type The last 4 bytes are the length

 

4.png

 

The picture above shows the first AMF package

  • type= 0x02corresponding String
  • size=0A=10
  • value=onMetaData is exactly 10 bytes.

     

     

    5.png

     

    The picture above shows the second AMF

  • type=0x08 对应ECMA array type。

Represents an array, similar to Map. The last 4 bytes are the number of arrays. Then there are key-value pairs, the first is the key, and 2 bytes are the length. Followed by specific content. The next 3 bytes indicate the type of value, and then the length is judged according to the type.
In the above figure, we can judge that there are a total of 13 key-value pairs.
The first length of 8 bytes is duration. The value type is 0x004073, the first byte is 00, so it is double, 8 bytes.
The second length of 5 bytes is width. The value is also of type double, 8 bytes.
Analyze in turn...

Everywhere, we already know how to parse the data whose Tag is script in FLV.

video

 

 

6.png


type 0x09==9. This should be a video.
size= 0x000030=48. The length is 48.
timestreamp= 0x000000.
TimestampExtended = 0x00.
stream id = 0x000000
we see the data part:
video information + data

 

Video information, 1 byte.

The first 4 bits are Frame Type

value Types of
1 keyframe (for AVC, a seekable frame)
2 inter frame (for AVC, a non-seekable frame)
3 disposable inter frame (H.263 only)
4 generated keyframe (reserved for server use only)
5 video info/command frame

The last 4 digits are coded ID (CodecID)

value Types of
1 JPEG (currently unused)
2 Sorenson H.263
3 Screen video
4 On2 VP6
5 On2 VP6 with alpha channel
6 Screen video version 2
7 Stroke

Special case

If the video format ( CodecID ) is AVC (H.264), VideoTagHeader will have 4 more bytes of information, AVCPacketType and CompositionTime.

  • AVCPacketType occupies 1 byte
value Types of
0 AVCDecoderConfigurationRecord(AVC sequence header)
1 AVC NALU
2 AVC end of sequence (lower level NALU sequence ender is not required or supported)

AVCDecoderConfigurationRecord . Contains the more important sps and pps information related to H.264 decoding. The sps and pps information must be sent before sending the data stream to the AVC decoder, otherwise the decoder cannot decode normally. And before starting again after the decoder is stopped, such as seek, fast forward and fast reverse state switching, etc., you need to send the information of sps and pps again. AVCDecoderConfigurationRecord usually appears once in the FLV file , which is the first video tag .

  • CompositionTime occupies 3 bytes
condition value
AVCPacketType ==1 Composition time offset
AVCPacketType !=1 0

我们看第一个video tag,也就是前面那张图。我们看到AVCPacketType =0。而后面三个字节也是0。说明这个tag记录的是AVCDecoderConfigurationRecord。包含sps和pps数据。
再看到第二个video tag

 

 

8.png


我们看到 AVCPacketType =1,而后面三个字节为000043。这是一个视频帧数据。

 

解析到的数据完全符合上面的理论。

sps pps

前面我们提到第一个video 一般存放的是sps和pps。这里我们具体解析下sps和pps内容。先看下存储的格式(图6):
0x01+sps[1]+sps[2]+sps[3]+0xFF+0xE1+sps size+sps+01+pps size+pps
我们看到图7 。
sps[1]=0x64
sps[2]=00
sps[3]=0D
sps size=0x001B=27
跳过27个字节后,是0x01
pps size=0x0005=118
跳过5个字节,就到了back-pointers。

视频帧数据

解析出sps和pps tag后,后面的video tag就是真正的视频数据内容了

 

9.png


这是第二个video tag其实和图8一样,只是我圈出来关键信息。先看下格式
frametype=0x17=00010111
AVCPacketType =1
Composition Time=0x000043
后面就是NALU DATA

 

Audio

与视频格式类似
前4位为音频格式

类型
0 Linear PCM, platform endian
1 ADPCM
2 MP3
3 Linear PCM, little endian
4 Nellymoser 16-kHz mono
5 Nellymoser 8-kHz mono
6 Nellymoser
7 G.711 A-law logarithmic PCM
8 G.711 mu-law logarithmic PCM
9 reserved
10 AAC
11 Speex
14 MP3 8-Khz
15 Device-specific sound

接着2位为采样率

类型
0 5.5-kHz
1 11-kHz
2 22-kHz
3 44-kHz

对于AAC总是3

接着1位为采样的长度

类型
0 snd8Bit
1 snd16Bit

压缩过的音频都是16bit

接着1位为音频类型

类型
0 sndMono
1 sndStereo

对于AAC总是1

我们看到第三个TAG

 

7.png

Guess you like

Origin blog.csdn.net/wdglhack/article/details/109811722