[Processing] FLV video and audio file formats Detailed

Brief introduction

FLV (Flash Video) is now very popular streaming media formats, video files due to its compact size, packaging plays a simple features, making it suitable for applications on the network, the current mainstream video sites, without exception, used FLV format . In addition, as with the current browser closely with Flash Player, so that the web player FLV video easy, is one of the reasons the popular FLV.

FLV streaming media package format, which we can see a binary data byte stream. Overall, FLV including file header (File Header) and file body (File Body) two parts, one file is composed of a series of Tag and Tag Size pairs.

 

FLV format parsing

First to a map, this is the "East Wind Breaks" - Jay ( download ) of a MV video. I use a binary viewer Binary Viewer.

1.png

 

header

A head portion at several parts
Signature (3 Byte) + Version ( 1 Byte) + Flags (1 Bypte) + DataOffset (4 Byte)

  • 3 bytes signature
    fixed FLV three characters as indicated. General found that the first three characters when you think he is flv FLV file.
  • Version 1 byte
    version of FLV marked. Here we see are 1
  • Flags 1 byte
    content label. Bits 0 and second bits, respectively, the presence of video and audio. (1 indicates the presence and 0 indicates the absence). Screenshot see Shi 0x05, that is 00000101, on behalf of both video, and audio.
  • DataOffset 4 bytes
    indicates the length of the FLV header. Here you can see a fixed 9

body

FLV body portion is composed of a series of back-pointers + tag configuration

  • back-pointers fixed to 4 bytes, a front of a tag size.
  • tag three types, video, audio, scripts.

tag composition

tag type+tag data size+Timestamp+TimestampExtended+stream id+ tag data

  • type 1 byte. 8 Audio, 9 to Video, 18 for the scripts
  • tag data size 3 bytes. It represents the length of the tag data. Counting from the streamd id.
  • Timestreamp 3 bytes. Timestamp
  • TimestampExtended 1 byte. Timestamp extension field
  • stream id 3 bytes. Always 0
  • Data tag data portion

Let's analyze an example:
see the first TAG
of the type = 0x12= 18. There should be a scripts.
= size 0x000125= 293. Length 293. =
timestreamp 0x000000. Here are scripts, so is 0
TimestampExtended = 0x00.
stream id = 0x000000
we look at the data TAG part:

3.png

 

tag division

The red areas is two back-pointers marked I, is four bytes. The center is the first TAG. That's how to calculate it? We have to be an example of this.

  • First, it is the first back-pointers 0x00000000, it is because the back is the first TAG. So he is 0.
  • Then we get to a size based on our previous format 0x000125. That is to say from behind stream id plus 293 bytes on to the end of the first TAG, we look at what number. stream id before a total of 24 bytes (11 + 9 + 4). Then to the first end of TAG, TAG position next start is 293 + 24 = 137 = 0x13D.
  • Next we find 0x13Dthe address, it is easy to find from the tool, it is just in front of the red underline. Part of red 0x00000130= 304, which represents the size of a TAG.
  • Finally, we calculate, on a TAG data section is 293 bytes, front type, stream id and other fields accounted for 11 bytes. Exactly match.

Above we already know how to take dividing each TAG. Then we look at the specific content of the TAG

tag content

As already mentioned 3 types of tag. We see one by one

script

Usually only one script Tag, Tag is the first of flv, flv used to store information, such as duration, audiodatarate, creator, width and so on.
First introduced at the data type of the script. All data are based on data type + (data length) + data format occurring, accounting 1byte data type, data length to see whether the data type is present, the data is back.
Generally, the structure comprises two AMF Tag Data packets. AMF (Action Message Format) is a common design Adobe data encapsulation format, in Adobe application in many products, in simple terms, the AMF the different types of data in a uniform format to describe. AMF packet encapsulating a first type of data strings, used to load a "onMetaData" flag, and the flag Adobe have some API calls, not dwell here. AMF a second packet array type packages, the array contains the names and values of the audio and video information items. Specified as follows, it can be understood with reference to the data on the picture.

value Types of Explanation
0 Number type 8 Bypte Double
1 Boolean type 1 Bypte bool
2 String type Back length 2 bytes
3 Object type  
4 MovieClip type  
5 Null type  
6 Undefined type  
7 Reference type  
8 ECMA array type Array, similar to the Map
10 Strict array type  
11 Date type  
12 Long string type A length of 4 bytes behind

4.png

 

The first picture shows the package AMF

  • type = 0x02corresponds String
  • size=0A=10
  • value = onMetaData exactly 10 bytes.

     

    5.png

     

    Pictured on the second AMF

  • type=0x08 对应ECMA array type。

An array, similar to the Map. 4 bytes behind the array number. Then the key-value pairs, the first is a bond, a length of 2 bytes. Followed by the specific content. Then three byte type value, depending on the type and length is determined.
The figure we can determine, a total of 13 pairs.
The first 8 bytes in length is the duration. Value type is 0x004073, the first byte is 00, it is double, 8 bytes.
The second length is 5 bytes width. Values are type double, 8 bytes.
In turn resolved to go ...

Everywhere, we already know how to interpret the data Tag FLV in the script.

video

 

6.png


= of the type 0x09= 9. There should be a video.
= size 0x000030= 48. A length of 48. =
timestreamp 0x000000.
= TimestampExtended 0x00.
stream id = 0x000000
we see the data section:
video + data

 

Video information, 1 byte.

4 is a front frame type Frame Type

value Types of
1 keyframe (for AVC, a seekable frame) keyframe
2 inter frame (for AVC, a non-seekable frame)
3 disposable inter frame (H.263 only)
4 generated keyframe (reserved for server use only)
5 video info/command frame

After four to encode ID (CodecID)

value Types of
1 JPEG (currently unused)
2 Sorenson H.263
3 Screen video
4 On2 VP6
5 On2 VP6 with alpha channel
6 Screen video version 2
7 AVC

Special case

Video format ( CodecID ) is an AVC (H.264) then, VideoTagHeader will be more than four bytes of information, AVCPacketType and CompositionTime.

  • AVCPacketType 1 byte
value Types of
0 AVCDecoderConfigurationRecord(AVC sequence header)
1 AVC NALU
2 AVC end of sequence (lower level NALU sequence ender is not required or supported)

AVCDecoderConfigurationRecord.包含着是H.264解码相关比较重要的spspps信息,再给AVC解码器送数据流之前一定要把sps和pps信息送出,否则的话解码器不能正常解码。而且在解码器stop之后再次start之前,如seek、快进快退状态切换等,都需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情况也是出现1次,也就是第一个video tag.

  • CompositionTime 占3个字节
条件
AVCPacketType ==1 Composition time offset
AVCPacketType !=1 0

我们看第一个video tag,也就是前面那张图。我们看到AVCPacketType =0。而后面三个字节也是0。说明这个tag记录的是AVCDecoderConfigurationRecord。包含sps和pps数据。
再看到第二个video tag

 

8.png


我们看到 AVCPacketType =1,而后面三个字节为000043。这是一个视频帧数据。

 

解析到的数据完全符合上面的理论。

sps pps

前面我们提到第一个video 一般存放的是sps和pps。这里我们具体解析下sps和pps内容。先看下存储的格式(图6):
0x01+sps[1]+sps[2]+sps[3]+0xFF+0xE1+sps size+sps+01+pps size+pps
我们看到图 。
sps[1]=0x64
sps[2]=00
sps[3]=0D
sps size=0x001B=27
跳过27个字节后,是0x01
pps size=0x0005=5
跳过5个字节,就到了back-pointers。

视频帧数据

解析出sps和pps tag后,后面的video tag就是真正的视频数据内容了

9.png


这是第二个video tag其实和图8一样,只是我圈出来关键信息。先看下格式
frametype=0x17=00010111
AVCPacketType =1
Composition Time=0x000043
后面就是NALU DATA

 

Audio

与视频格式类似
前4位为音频格式

类型
0 Linear PCM, platform endian
1 ADPCM
2 MP3
3 Linear PCM, little endian
4 Nellymoser 16-kHz mono
5 Nellymoser 8-kHz mono
6 Nellymoser
7 G.711 A-law logarithmic PCM
8 G.711 mu-law logarithmic PCM
9 reserved
10 AAC
11 Speex
14 MP3 8-Khz
15 Device-specific sound

接着2位为采样率

类型
0 5.5-kHz
1 11-kHz
2 22-kHz
3 44-kHz

对于AAC总是3

接着1位为采样的长度

类型
0 snd8Bit
1 snd16Bit

压缩过的音频都是16bit

接着1位为音频类型

类型
0 sndMono
1 sndStereo

对于AAC总是1

我们看到第三个TAG

7.png

 

这个留给大家自己来解析吧。


 

发布了201 篇原创文章 · 获赞 46 · 访问量 9万+

Guess you like

Origin blog.csdn.net/rong11417/article/details/104675234