Part 1flvtag composition

The FLV file structure consists of FLVheader and FLVBody. (Note that the flv file is in big endian format)

The composition of the FLV header (take c as an example, one-byte alignment):

typedef struct _FLV_HEADER
{
	char FLV[3];//={0x46,0x4c,0x56};
	char Ver;   //版本号
	char StreamInfo;// 有视频又有音频就是0x01 | 0x04（0x05）
	int HeaderLen; /*****头的长度*****/
} FLV_HEADER;

FLVBody is composed of several tags;

Tag=Tag header (11 bytes) + data

typedef struct _FLV_TAG_HEADER
{
       int  PreTagLen;  /*******上一个Tag长度********/            //   4
       char TagType;   //音频（0x8），视频（0x9），脚本（0x12）    //  1
       char TagDtatLen[3];//数据区长度 3字节
       char Timestamp[3];//时间戳  3字节
       char ExpandTimeStamp;//是扩展时间戳  1字节
       char streamID[3];//流Id可以为0   3字节
}FLV_TAG_HEADER;

Part2 h264

H264 is composed of NALU units, each unit is separated by 00 00 01 or 00 00 00 01, and every two 00 00 00 01 is a NALU unit. We are actually encapsulating one NALU unit into an FLV file.

The lower 5 bits of the first byte at the beginning of each NALU unit indicate the type of the unit, that is, NAL nal_unit_type:

#define NALU_TYPE_SLICE 1
#define NALU_TYPE_DPA 2
#define NALU_TYPE_DPB 3
#define NALU_TYPE_DPC 4
#define NALU_TYPE_IDR 5    /**关键帧***/
#define NALU_TYPE_SEI 6    /*****曾强帧******/     
#define NALU_TYPE_SPS 7
#define NALU_TYPE_PPS 8
#define NALU_TYPE_AUD 9
#define NALU_TYPE_EOSEQ 10
#define NALU_TYPE_EOSTREAM 11
#define NALU_TYPE_FILL 12

The first byte & 0x1f of each NALU can get its type. For example, the first NALU in the above figure: 67 & 0x1f = 7, then this unit is SPS, the third: 68 & 0x1f = 8, then this unit is PPS.

Part3 h264 package flv

We now start to encapsulate H264, AAC into FLV files.

First define a function (function reverse copy):

void ReverseMemcpy(void* dest,size_t destLen, const void* src, size_t n)
{
		char*     d= (char*) dest;
		const char*  s= (const char*)src;
		s=s+n-1;
		while(n--&&destLen--)
		{
			*d++=*s--;
		}
		return dest;
}

1. Write the FLV header.

2. Write the FLV script Tag;

3. Since H264 and AAC are subpackaged, a video configuration information and an audio configuration information are written

3. Write video Tag. Since it is H264, Tag data needs to be encapsulated in AVC format. There are two types of Tag data areas, one is video (0x17) and the other is audio (0x27).

AVC format: AVCPacketType (1 byte) + CompositionTime (3 bytes)

If AVCPacketType=0x00, the format is AVCPacketType (1 byte) + CompositionTime (3 bytes) + AVCDecoderConfigurationRecord.

If AVCPacketType=0x01, the format is AVCPacketType (1 byte) + CompositionTime (3 bytes) + 4 bytes of NALU unit length + N bytes of NALU data.

AVCDecoderConfigurationRecord structure information

typedef struct _AVC_DEC_CON_REC
{
	char cfgVersion;//configurationVersion  //0x01
	char avcProfile;//AVCProfileIndication  //sps[1]
	char profileCompatibility;//profile_Compatibility //sps[2]
	char avcLevel;//AVCLevelIndication //sps[3]
	//lengthSizeMinusOne:indicates the length in bytes of the NALUnitLength field in an AVC video
	char reserved6_lengthSizeMinusOne2;//
	char reserved3_numOfSPS5;//个数
	long spsLength;//sequenceParameterSetLength
	void *sps;
	char numOfPPS;//个数
	long ppsLength;
	void *pps;
}AVC_DEC_CON_REC;

char *pH264Data=....;//h264数据。
int H264DataLen=....;//h264数据长度
FLV_TAG_HEADER tagHeader;
char AVCPacket[4]={0x00,0x00,0x00,0x00}
memset(tagHeader,0,sizeof(FLV_TAG_HEADER));
int Index=0;//分隔符长度
if(*pH264Data==0x00&&(*pH264Data+1)==0x00&&(*pH264Data+2)==0x01)
{
     Index=3;
}else if(*pH264Data==0x00&&(*pH264Data+1)==0x00&&(*pH264Data+2)==0x00&&(*pH264Data+4)==0x01)
{
     Index=4;
}else{
    Err//错误不是h264数据
}
if(*(pH264Data+Index)&0x1f==0x07)//sps帧，此h264数据还有一帧，pps。
{
     int PreTagLen=.....//前一个Tag长度
     ReverseMemcpy(&tagHeader.PreTagLen,4,&PreTagLen,4);//大端字节序；
     tagHeader.TagType=0x09;//视频类型
     //AVCPacket应全为0x00.
     
}

part 4. Audio AAC package flv.

AAC audio format has ADIF and ADTS: ADIF: Audio Data Interchange Format Audio data interchange format. The feature of this format is that the beginning of the audio data can be found deterministically, without decoding that starts in the middle of the audio data stream, that is, its decoding must be done at a well-defined beginning. Therefore, this format is commonly used in disk files. ADTS: Audio Data Transport Stream Audio data transport stream. The characteristic of this format is that it is a bit stream with sync words, and decoding can start anywhere in the stream. Its characteristics are similar to the mp3 data stream format. Simply put, ADTS can be decoded in any frame, which means that it has header information for each frame. ADIF has only one unified header, so it must get all the data and decode it. And the formats of the two headers are also different. At present, the audio streams in the ADTS format are generally encoded and extracted. The voice system has high requirements for real-time performance. It is basically a process of collecting audio data, local encoding, data uploading, server processing, data delivery, and local decoding. ADTS is a frame sequence, which has the characteristics of a stream. processing is more appropriate.

Therefore, we often choose ADTS for audio coding. The following are our commonly used configurations

    m_hAacEncoder= faacEncOpen(capability.nSamplesPerSec,capability.nChannels,
 &m_nAacInputSamples, &m_nAacMaxOutputBytes);
    m_nAacnMaxInputBytes=m_nAacInputSamples*capability.wBitsPerSample/8;
    m_pAacConfig = faacEncGetCurrentConfiguration(m_hAacEncoder);//获取配置结构指针
    m_pAacConfig->inputFormat = FAAC_INPUT_16BIT;//16精度
	m_pAacConfig->outputFormat=1; //   设置为 ADTS   
	m_pAacConfig->useTns=true;
	m_pAacConfig->useLfe=false;
	m_pAacConfig->aacObjectType=LOW;
	m_pAacConfig->shortctl=SHORTCTL_NORMAL;
	m_pAacConfig->quantqual=100;
	m_pAacConfig->bandWidth=0;
	m_pAacConfig->bitRate=capability.nAvgBytesPerSec;

For the aac audio and video of flv, the configuration information needs to be written in the first frame .

 flv_packet flvpacket=GetErrPacket();		
 int TagDataLen=1000;
 char *pTagBuffer=(char *)::malloc(TagDataLen);
 memset(pTagBuffer,0,TagDataLen);
 KKMEDIA::FLV_TAG_HEADER Tag_Head;
 memset(&Tag_Head,0,sizeof(Tag_Head));
 FlvMemcpy(&Tag_Head.PreTagLen,4,&m_nPreTagLen,4);
 memset(&Tag_Head.Timestamp,0,3);
 Tag_Head.TagType=0x08;///音频
 int datalen=0;
 memcpy(pTagBuffer,&Tag_Head,sizeof(KKMEDIA::FLV_TAG_HEADER));
 datalen+=sizeof(KKMEDIA::FLV_TAG_HEADER);
 //前4bits表示音频格式（全部格式请看官方文档）：
 //1 -- ADPCM
 //2 -- MP3
 //4 -- Nellymoser 16-kHz mono
 //5 -- Nellymoser 8-kHz mono
 //10 -- AAC
 //面两个bits表示samplerate：
 //·0 -- 5.5KHz
 //·1 -- 11kHz
 //·2 -- 22kHz
 //·3 -- 44kHz 1100=0x0C
 //下面1bit表示采样长度：
 //·0 -- snd8Bit
 //·1 -- snd16Bit
 //下面1bit表示类型：
 //·0 -- sndMomo
 //·1 -- sndStereo  
 char TagAudio=0xAF; //1010,11,1,1
 //TagAudio &=0x0C;//3
 //TagAudio &=0x02;//1
 //TagAudio &=0x01;//sndStereo
 memcpy(pTagBuffer+datalen,&TagAudio,1);
 datalen++;
 char AACPacketType=0x00;//012->
 memcpy(pTagBuffer+datalen,&AACPacketType,1);
 datalen++;
 ///两个字节
 char AudioSpecificConfig[2]={0x12,0x90};///32000hz
 memcpy(pTagBuffer+datalen,&AudioSpecificConfig,2);
 datalen+=2;
 m_nPreTagLen=datalen-4;///（tag长度值）
 TagDataLen=datalen-15;//(11+4(tag长度值+tag的头)
 //Tag 数据区长度
 FlvMemcpy(pTagBuffer+5,3,&TagDataLen,3);
 flvpacket.buf =(unsigned char*)pTagBuffer;			 
 flvpacket.bufLen=datalen;
 flvpacket.taglen=m_nPreTagLen;
 return flvpacket;

About how the value of AudioSpecificConfig in the above code is calculated, it can be obtained directly from the aac encoding library, or calculated by formula, please take a look at the code.

///索引表
static unsigned const samplingFrequencyTable[16] = {
  96000, 88200, 64000, 48000,
  44100, 32000, 24000, 22050,
  16000, 12000, 11025, 8000,
  7350,  0,     0,      0
};
int profile=1;
int samplingFrequencyIndex=0;
for(int i=0;i<16;i++)
{
	    if(samplingFrequencyTable[i]==32000)
		{
		   samplingFrequencyIndex =i;
		   break;
		}
}
char channelConfiguration =0x02;//(立体声)
UINT8 audioConfig[2] = {0};  
UINT8 const audioObjectType = profile + 1;  ///其中profile=1;
audioConfig[0] = (audioObjectType<<3) | (samplingFrequencyIndex>>1);  
audioConfig[1] = (samplingFrequencyIndex<<7) | (channelConfiguration<<3);  
printf("%02x%02x", audioConfig[0], audioConfig[1]);

Finally, the aac frame data is written, please see the following code:

                flv_packet flvpacket=GetErrPacket();
			    int TagDataLen=1000+srcLen;
				char *pTagBuffer=(char *)::malloc(TagDataLen);
				memset(pTagBuffer,0,TagDataLen);
				KKMEDIA::FLV_TAG_HEADER Tag_Head;
				memset(&Tag_Head,0,sizeof(Tag_Head));
				//FlvMemcpy等同于ReverseMemcpy
				FlvMemcpy(&Tag_Head.PreTagLen,4,&m_nPreTagLen,4);
				FlvMemcpy(&Tag_Head.Timestamp,3,&pts,3);

				Tag_Head.TagType=0x08;///音频
				int datalen=0;
				memcpy(pTagBuffer,&Tag_Head,sizeof(KKMEDIA::FLV_TAG_HEADER));
				datalen+=sizeof(KKMEDIA::FLV_TAG_HEADER);
				char TagAudio=0xAF; 
				memcpy(pTagBuffer+datalen,&TagAudio,1);
				datalen++;

				char AACPacketType=0x01;
				memcpy(pTagBuffer+datalen,&AACPacketType,1);
				datalen++;
				
                //src aac数据指针(不包含ADTS头长度)，srcLenAAC数据长度
				memcpy(pTagBuffer+datalen,src,srcLen);
				datalen+=srcLen;

				m_nPreTagLen=datalen-4;///（tag长度值）
			    TagDataLen=datalen-15;//(11+4(tag长度值+tag的头)

			    //Tag 数据区长度
			    FlvMemcpy(pTagBuffer+5,3,&TagDataLen,3);
			    flvpacket.buf =(unsigned char*)pTagBuffer;			 
			    flvpacket.bufLen=datalen;
				flvpacket.taglen=m_nPreTagLen;
			    return flvpacket;

Note that the AAC encoded data output using the ADTS header needs to skip the length of the adts header (7 bytes) when packing the flv format.

E.g:

AudioPacket((const unsigned  char *)(pDataNALU+7),PktSize-7,Pts);

h264 aac package flv

Part 1flvtag composition

Part2 h264

Part3 h264 package flv

part 4. Audio AAC package flv.

Guess you like