Brief introduction
FLV (Flash Video) is now very popular streaming media formats, video files due to its compact size, packaging plays a simple features, making it suitable for applications on the network, the current mainstream video sites, without exception, used FLV format . In addition, as with the current browser closely with Flash Player, so that the web player FLV video easy, is one of the reasons the popular FLV.
FLV streaming media package format, which we can see a binary data byte stream. Overall, FLV including file header (File Header) and file body (File Body) two parts, one file is composed of a series of Tag and Tag Size pairs.
FLV format parsing
First to a map, this is the "East Wind Breaks" - Jay ( download ) of a MV video. I use a binary viewer Binary Viewer.
1.png
header
A head portion at several parts
Signature (3 Byte) + Version ( 1 Byte) + Flags (1 Bypte) + DataOffset (4 Byte)
- 3 bytes signature
fixed FLV three characters as indicated. General found that the first three characters when you think he is flv FLV file. - Version 1 byte
version of FLV marked. Here we see are 1 - Flags 1 byte
content label. Bits 0 and second bits, respectively, the presence of video and audio. (1 indicates the presence and 0 indicates the absence). Screenshot see Shi0x05
, that is00000101
, on behalf of both video, and audio. - DataOffset 4 bytes
indicates the length of the FLV header. Here you can see a fixed 9
body
FLV body portion is composed of a series of back-pointers + tag configuration
- back-pointers fixed to 4 bytes, a front of a tag size.
- tag three types, video, audio, scripts.
tag composition
tag type
+tag data size
+Timestamp
+TimestampExtended
+stream id
+ tag data
- type 1 byte. 8 Audio, 9 to Video, 18 for the scripts
- tag data size 3 bytes. It represents the length of the tag data. Counting from the streamd id.
- Timestreamp 3 bytes. Timestamp
- TimestampExtended 1 byte. Timestamp extension field
- stream id 3 bytes. Always 0
- Data tag data portion
Let's analyze an example:
see the first TAG
of the type = 0x12
= 18. There should be a scripts.
= size 0x000125
= 293. Length 293. =
timestreamp 0x000000
. Here are scripts, so is 0
TimestampExtended = 0x00
.
stream id = 0x000000
we look at the data TAG part:
3.png
tag division
The red areas is two back-pointers marked I, is four bytes. The center is the first TAG. That's how to calculate it? We have to be an example of this.
- First, it is the first back-pointers
0x00000000
, it is because the back is the first TAG. So he is 0. - Then we get to a size based on our previous format
0x000125
. That is to say from behind stream id plus 293 bytes on to the end of the first TAG, we look at what number. stream id before a total of 24 bytes (11 + 9 + 4). Then to the first end of TAG, TAG position next start is 293 + 24 = 137 =0x13D
. - Next we find
0x13D
the address, it is easy to find from the tool, it is just in front of the red underline. Part of red0x00000130
= 304, which represents the size of a TAG. - Finally, we calculate, on a TAG data section is 293 bytes, front type, stream id and other fields accounted for 11 bytes. Exactly match.
Above we already know how to take dividing each TAG. Then we look at the specific content of the TAG
tag content
As already mentioned 3 types of tag. We see one by one
script
Usually only one script Tag, Tag is the first of flv, flv used to store information, such as duration, audiodatarate, creator, width and so on.
First introduced at the data type of the script. All data are based on data type + (data length) + data format occurring, accounting 1byte data type, data length to see whether the data type is present, the data is back.
Generally, the structure comprises two AMF Tag Data packets. AMF (Action Message Format) is a common design Adobe data encapsulation format, in Adobe application in many products, in simple terms, the AMF the different types of data in a uniform format to describe. AMF packet encapsulating a first type of data strings, used to load a "onMetaData" flag, and the flag Adobe have some API calls, not dwell here. AMF a second packet array type packages, the array contains the names and values of the audio and video information items. Specified as follows, it can be understood with reference to the data on the picture.
value | Types of | Explanation |
---|---|---|
0 | Number type | 8 Bypte Double |
1 | Boolean type | 1 Bypte bool |
2 | String type | Back length 2 bytes |
3 | Object type | |
4 | MovieClip type | |
5 | Null type | |
6 | Undefined type | |
7 | Reference type | |
8 | ECMA array type | Array, similar to the Map |
10 | Strict array type | |
11 | Date type | |
12 | Long string type | A length of 4 bytes behind |
4.png
The first picture shows the package AMF
- type =
0x02
corresponds String - size=
0A
=10 -
value = onMetaData exactly 10 bytes.
5.png
Pictured on the second AMF
- type=
0x08
对应ECMA array type。
An array, similar to the Map. 4 bytes behind the array number. Then the key-value pairs, the first is a bond, a length of 2 bytes. Followed by the specific content. Then three byte type value, depending on the type and length is determined.
The figure we can determine, a total of 13 pairs.
The first 8 bytes in length is the duration. Value type is0x004073
, the first byte is 00, it is double, 8 bytes.
The second length is 5 bytes width. Values are type double, 8 bytes.
In turn resolved to go ...
Everywhere, we already know how to interpret the data Tag FLV in the script.
video
6.png
= of the type 0x09
= 9. There should be a video.
= size 0x000030
= 48. A length of 48. =
timestreamp 0x000000
.
= TimestampExtended 0x00
.
stream id = 0x000000
we see the data section:
video + data
Video information, 1 byte.
4 is a front frame type Frame Type
value | Types of |
---|---|
1 | keyframe (for AVC, a seekable frame) keyframe |
2 | inter frame (for AVC, a non-seekable frame) |
3 | disposable inter frame (H.263 only) |
4 | generated keyframe (reserved for server use only) |
5 | video info/command frame |
After four to encode ID (CodecID)
value | Types of |
---|---|
1 | JPEG (currently unused) |
2 | Sorenson H.263 |
3 | Screen video |
4 | On2 VP6 |
5 | On2 VP6 with alpha channel |
6 | Screen video version 2 |
7 | AVC |
Special case
Video format ( CodecID ) is an AVC (H.264) then, VideoTagHeader will be more than four bytes of information, AVCPacketType and CompositionTime.
- AVCPacketType 1 byte
value | Types of |
---|---|
0 | AVCDecoderConfigurationRecord(AVC sequence header) |
1 | AVC NALU |
2 | AVC end of sequence (lower level NALU sequence ender is not required or supported) |
AVCDecoderConfigurationRecord.包含着是H.264解码相关比较重要的sps和pps信息,再给AVC解码器送数据流之前一定要把sps和pps信息送出,否则的话解码器不能正常解码。而且在解码器stop之后再次start之前,如seek、快进快退状态切换等,都需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情况也是出现1次,也就是第一个video tag.
- CompositionTime 占3个字节
条件 | 值 |
---|---|
AVCPacketType ==1 | Composition time offset |
AVCPacketType !=1 | 0 |
我们看第一个video tag,也就是前面那张图。我们看到AVCPacketType =0。而后面三个字节也是0。说明这个tag记录的是AVCDecoderConfigurationRecord。包含sps和pps数据。
再看到第二个video tag
8.png
我们看到 AVCPacketType =1,而后面三个字节为000043
。这是一个视频帧数据。
解析到的数据完全符合上面的理论。
sps pps
前面我们提到第一个video 一般存放的是sps和pps。这里我们具体解析下sps和pps内容。先看下存储的格式(图6):0x01
+sps[1]
+sps[2]
+sps[3]
+0xFF
+0xE1
+sps size
+sps
+01
+pps size
+pps
我们看到图 。
sps[1]=0x64
sps[2]=00
sps[3]=0D
sps size=0x001B
=27
跳过27个字节后,是0x01
pps size=0x0005
=5
跳过5个字节,就到了back-pointers。
视频帧数据
解析出sps和pps tag后,后面的video tag就是真正的视频数据内容了
9.png
这是第二个video tag其实和图8一样,只是我圈出来关键信息。先看下格式
frametype=0x17
=00010111
AVCPacketType =1
Composition Time=0x000043
后面就是NALU DATA
Audio
与视频格式类似
前4位为音频格式
值 | 类型 |
---|---|
0 | Linear PCM, platform endian |
1 | ADPCM |
2 | MP3 |
3 | Linear PCM, little endian |
4 | Nellymoser 16-kHz mono |
5 | Nellymoser 8-kHz mono |
6 | Nellymoser |
7 | G.711 A-law logarithmic PCM |
8 | G.711 mu-law logarithmic PCM |
9 | reserved |
10 | AAC |
11 | Speex |
14 | MP3 8-Khz |
15 | Device-specific sound |
接着2位为采样率
值 | 类型 |
---|---|
0 | 5.5-kHz |
1 | 11-kHz |
2 | 22-kHz |
3 | 44-kHz |
对于AAC总是3
接着1位为采样的长度
值 | 类型 |
---|---|
0 | snd8Bit |
1 | snd16Bit |
压缩过的音频都是16bit
接着1位为音频类型
值 | 类型 |
---|---|
0 | sndMono |
1 | sndStereo |
对于AAC总是1
我们看到第三个TAG
7.png
这个留给大家自己来解析吧。