Audio and video development converts aac to pcm study notes

Audio Basics Main Reference Articles

About the conversion of PCM audio and g711 audio encoding.

Audio coding (PCM, G711A, G711U, AAC) understanding

AAC audio coding format introduction

key concepts

PCM understanding

PCM: Also known as Pulse Code Modulation. What the human ear hears is an analog signal, and PCM is a technology that converts sound from an analog signal to a digital signal. The principle is to use a fixed frequency to sample the analog signal. The sampled signal looks like a series of continuous pulses with different amplitudes (electric shocks with short-term pulse fluctuations) on the waveform. These quantized values ​​are continuously output, transmitted, processed or recorded into storage media, all of which constitute the digital audio generation process (three processes of sampling, quantization, and encoding).
6 parameters describing PCM data:

Sample Rate : Sampling frequency. 8kHz (telephone), 44.1kHz (CD), 48kHz (DVD).
Sample Size : quantization bits. Usually the value is 16-bit.
Number of Channels : The number of channels. There are two types of common audio, stereo (stereo) and mono (mono). Stereo includes left and right channels. There are also other less commonly used types such as surround sound.
Sign : Indicates whether the sample data is a signed bit, such as sample data represented by one byte, if signed, the range is -128 ~ 127, and unsigned is 0 ~ 255.
Byte Ordering : byte order. Whether the byte order is little-endian or big-endian. Usually both are little-endian. See Section 4 for byte order description.
Integer Or Floating Point : integer or floating point. The PCM sample data in most formats is represented by an integer, and in some applications that require high precision, the PCM sample data is represented by a floating point type.

audio frame

Audio and video are different. Each frame of video is an image, and audio data is streamed. Different encoding formats have different encoding standards. Compare PCM and MP3. Because PCM has no compression, audio data per second can be obtained according to data such as sampling rate and bit width, and the concept of frame is not required; MP3 has a frame concept similar to H264 because there is more information after compression, and each frame has frame header.

G711

G711 is a set of voice compression standards formulated by the International Telecommunication Union, mainly used for telephone voice communication, and the maximum frequency of human voice is generally 3.4kHz, so as long as the human voice is sampled at a sampling frequency of 8k, it can guarantee complete compression. Restore the original sound.
The content of g711 is to encode a 13bit or 14bit sample into an 8bit sample.
The g711 standard mainly divides into two compression methods: a-law and mu-law,

a-law: Compress a 13bit pcm sample into an 8bit pcm sample.
mu-law: Compress a 14bit pcm sample into an 8bit pcm sample

Note:

Audio decoding must pay attention to setting the sampling rate, number of samples, sampling format, number of channels and other parameters, otherwise it is easy to make mistakes during playback and encoding and decoding

Code

#include <stdio.h>
using namespace std;

#define MAX_AUDIO_FRAME_SIZE  192000

#ifdef _WIN32
//Windows
extern "C"
{
    
    
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#include <libavutil/imgutils.h>
#include<libswresample/swresample.h>
};
#else
//Linux...
#ifdef __cplusplus
extern "C"
{
    
    
#endif
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#include <libavutil/imgutils.h>
#ifdef __cplusplus
};
#endif
#include <libswresample/swresample.h>
#endif

char filepath[] = "test.aac";
FILE* fp_pcm = fopen("test.pcm", "wb+");

int decode()
{
    
    
	AVFormatContext* formatCtx;//需要avformat_alloc_context();初始化
	int				i, stream_index;
	AVCodecContext* pCodecCtx = NULL;
	AVCodec* pCodec;

	AVFrame* pFrame;//需要av_frame_alloc();初始化
	AVPacket* packet;
	int ret, got_frame;

	avformat_network_init();//加载socket库以及网络加密协议相关的库,为后续使用网络相关提供支持 
	formatCtx = avformat_alloc_context();

	if (avformat_open_input(&formatCtx, filepath, NULL, NULL) != 0) {
    
    
		printf("Couldn't open input stream.\n");
		return -1;
	}

	if (avformat_find_stream_info(formatCtx, NULL) < 0) {
    
    
		//创建视频流,使用复用器解码得到码流,初始化AVStream
		printf("Couldn't find stream information.\n");
		return -1;
	}

	stream_index = -1;
	for (i = 0; i < formatCtx->nb_streams; i++)//nb_streams是输入视频的AVStream 个数
		if (formatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) {
    
    //切换到视频流
			stream_index = i;
			break;
		}

	if (stream_index == -1) {
    
    
		printf("Didn't find a video stream.\n");
		return -1;//如果没有找到视频流就退出
	}

	//将AVFormatContext指定的解码器让avcodec_find_decoder()去查找
	pCodecCtx = formatCtx->streams[stream_index]->codec;

	pCodec = avcodec_find_decoder(pCodecCtx->codec_id);//查找解码器id
	if (pCodec == NULL) {
    
    
		printf("Codec not found.\n");
		return -1;
	}
	if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) {
    
    
		printf("Could not open codec.\n");
		return -1;
	}

	//Output Info-----------------------------
	printf("--------------- File Information ----------------\n");
	av_dump_format(formatCtx, 0, filepath, 0);// 打印关于输入或输出格式的详细信息,例如持续时间,比特率,流,容器,程序,元数据,边数据,编解码器和时基。
	printf("-------------------------------------------------\n");

	pFrame = av_frame_alloc();//注册AvFrame
	packet = (AVPacket*)av_malloc(sizeof(AVPacket));//分配一个AVPacket包的内存
	av_init_packet(packet);

	printf("sample_rate = %d , channels = %d bit_rate = %d\n", pCodecCtx->sample_rate, pCodecCtx->channels, pCodecCtx->bit_rate);
	// ffmpeg - f s16le - ar 44100 - ac 6 - i output.pcm - c:a aac - b : a 375k 123.aac

	//设置转码后输出相关参数
	//采样的布局方式
	uint64_t out_channel_layout = AV_CH_LAYOUT_STEREO;
	//采样个数
	int out_nb_samples = 1024;
	//采样格式
	enum AVSampleFormat  sample_fmt = AV_SAMPLE_FMT_S16;
	//采样率
	int out_sample_rate = 44100;
	//通道数
	int out_channels = av_get_channel_layout_nb_channels(out_channel_layout);
	printf("%d\n", out_channels);
	//创建buffer
	int buffer_size = av_samples_get_buffer_size(NULL, out_channels, out_nb_samples, sample_fmt, 1);

	//注意要用av_malloc
	uint8_t* buffer = (uint8_t*)av_malloc(MAX_AUDIO_FRAME_SIZE * 2);

	int64_t in_channel_layout = av_get_default_channel_layout(pCodecCtx->channels);
	//打开转码器
	struct SwrContext* convert_ctx = swr_alloc();
	//设置转码参数
	convert_ctx = swr_alloc_set_opts(convert_ctx, out_channel_layout, sample_fmt, out_sample_rate, \
		in_channel_layout, pCodecCtx->sample_fmt, pCodecCtx->sample_rate, 0, NULL);
	//初始化转码器
	swr_init(convert_ctx);


	while (av_read_frame(formatCtx, packet) >= 0) {
    
    //拆包

		if (packet->stream_index == stream_index) {
    
    

			ret = avcodec_decode_audio4(pCodecCtx, pFrame, &got_frame, packet);

			if (ret < 0) {
    
    
				printf("Decode Error.\n");
				return -1;
			}
			if (got_frame) {
    
    
				// 写入PCM音频数据到文件
				swr_convert(convert_ctx, &buffer, MAX_AUDIO_FRAME_SIZE, (const uint8_t**)pFrame->data, pFrame->nb_samples);
				printf("pts:%10lld\t packet size:%d\n", packet->pts, packet->size);
				fwrite(buffer, 1, buffer_size, fp_pcm);

			}
		}
		av_free_packet(packet);
	}

	swr_free(&convert_ctx);
	fflush(fp_pcm);
	fclose(fp_pcm);
	fp_pcm = nullptr;

	av_frame_free(&pFrame);
	avcodec_close(pCodecCtx);
	avformat_close_input(&formatCtx);

	return 0;
}

int main(int argc, char* argv[])
{
    
    
	decode();
}

Guess you like

Origin blog.csdn.net/qq_51282224/article/details/131112668