Encapsulate h.264 video stream into flv format file (a .flv format)

The format of the flv file is actually a lot of information on the Internet, but how to encapsulate it into flv is not much. After reading a lot of information, I found one that I think is quite reliable: http://www.cnblogs.com/chef/archive/2012/07/18/2597279.html

In fact, flv is still a very simple video format. Let's talk about the format of FLV first.

 

FLV is a binary file. In short, it consists of a file header (FLV header) and many tags (FLV body). Tag can be divided into three categories: audio, video, script, representing audio stream, video stream, script stream respectively, and each tag is composed of tag header and tag data.

The file header consists of 9 bytes

 

The first 3 bytes are the file type, which is always "FLV", which is (0x46 0x4C 0x56). The 4th btye is the version number, which is generally 0x01 at present. The 5th byte is the information of the stream. The penultimate bit is 1, which means there is video (0x01), and the penultimate bit is 1, which means there is audio (0x4). If there is video and audio, it is 0x01 | 0x04 (0x05), and the others should be 0. The last 4bytes represent the length of the FLV header, 3+1+1+4 = 9 .

 

After the FLV header is the FLV body, which consists of several tags. The first part of each tag is the tag header. The length of the tag header is 11 bytes, but there are 4 bytes in front of each tag header to record the length of the previous tag, which will be discussed later. The first byte of the tag header records the type of tag, audio (0x8), video (0x9), and script (0x12); the second to fourth bytes are the length of the data area, which is the length of the tag data; the next three bytes bytes is a timestamp, the unit is milliseconds, the type is 0x12, the timestamp is 0, the timestamp controls the speed of file playback, and can be set according to the frame rate of audio and video; a byte after the timestamp is an extended timestamp, and the timestamp is not enough. It is used when it is long; the last 3 bytes is the streamID, but it is always 0, and then there is the data area (tag data), which is the bare stream of h264, and the length of the tag header is 1+3+3+1+3=11.

The 00 00 00 00 in front of 0x12 is the 4bytes that recorded the length of the previous tag, which is 0 because there is no tag in front of it.

 

If tag data is audio data, the first byte records the audio information:

The first 4bits represent the audio format (see the official documentation for all formats):

0 -- uncompressed

·1 -- ADPCM

2 -- MP3

·4 -- Nellymoser 16-kHz mono

·5 -- Nellymoser 8-kHz mono

·10 -- AAC

The following two bits represent samplerate:

·0 -- 5.5KHz

·1 -- 11kHz

·2 -- 22kHz

·3 -- 44kHz

The following 1bit represents the sampling length:

·0 -- snd8Bit

·1 -- snd16Bit

The following 1bit represents the type:

· 0 - sndMomo

·1 -- sndStereo

After that comes the data.

If it is video data, the first byte records video information:

The first 4bits indicate the type:

·1-- keyframe

·2 -- inner frame

·3 -- disposable inner frame (h.263 only)

·4 -- generated keyframe

后4bits表示解码器ID:

·2 -- seronson h.263

·3 -- screen video

·4 -- On2 VP6

·5 -- On2 VP6 with alpha channel

·6 -- Screen video version 2

·7 -- AVC (h.264)

之后是数据。

 

 

 

如果是AAC和AVC的音视频,则在放入数据前有一个音频和视频的配置信息需要写入前两个tag,等会再说。之前说每个tag前面会有一个记录上个tag长度的4个bytes(previous tag size),整个的flv文件其实是:FLV header + previous tag size0 + tag1 + previous tag size1 + tag2 + previous tag size2 + ... +tagN + previous tag sizeN。第一个previous tag size因为前面没有tag,所以为0,其他的总是记录着前面一个tag 长度(tag data size + tag header size)。

 

 

如果tag data是脚本数据,Script Tag Data,该类型Tag又通常被称为Metadata(元数据) Tag,会放一些关于FLV视频和音频的参数信息,如duration、width、height等。通常该类型Tag会跟在File Header后面作为第一个Tag出现,而且只有一个。一般来说,该Tag Data结构包含两个AMF包。AMF(Action Message Format)是Adobe设计的一种通用数据封装格式,在Adobe的很多产品中应用,简单来说,AMF将不同类型的数据用统一的格式来描述。第一个 AMF包封装字符串类型数据,用来装入一个“onMetaData”标志,这个标志与Adobe的一些API调用有,在此不细述。第二个AMF包封装一个数组类型,这个数组中包含了音视频信息项的名称和值。具体说明如下,大家可以参照图片上的数据进行理解

第一个AMF包:

 第1个字节表示AMF包类型,一般总是0x02,表示字符串,其他值表示意义请查阅文档。

  第2-3个字节为UI16类型值,表示字符串的长度,一般总是0x000A(“onMetaData”长度)。

  后面字节为字符串数据,一般总为“onMetaData”。

第二个AMF包:

 第1个字节表示AMF包类型,一般总是0x08,表示数组。

  第2-5个字节为UI32类型值,表示数组元素的个数。

  后面即为各数组元素的封装,数组元素为元素名称和值组成的对。表示方法如下:

   第1-2个字节表示元素名称的长度,假设为L。

     后面跟着为长度为L的字符串。

     第L+3个字节表示元素值的类型。

   后面跟着为对应值,占用字节数取决于值的类型

 

 

到此flv格式的解析就差不多了,如有写错的地方请指出。

 

附上一个网友写的flv的查看工具:http://download.csdn.net/detail/yeyumin89/4534822 大笑安静

 

http://blog.csdn.net/yeyumin89/article/details/7932368

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326425651&siteId=291194637