FFmpeg进阶: 采用音频滤镜对音频进行转码

文章目录

很多时候为了让视频文件适应不同的播放领域，我们需要对音频文件进行转码操作，转码操作其实主要就是修改音频文件的各种参数包括:采样位数、采样率、音频布局、码率等等。下面分别介绍一下各个参数的意义和作用。

采样位数

采样位数也称为位深度、分辨率，它是指声音的连续强度被数字表示后可以分为多少级。N-bit的意思声音的强度被均分为2^N级。16位的就是65535级。这是一个很大的数了，人可能也分辨不出1/65536的音强差别。也可以说是声卡的分辨率，它的数值越大，分辨率也就越高，所发出声音的能力越强。这里的采样倍数主要针对的是信号的强度特性，采样率针对的是信号的时间(频率)特性这是两个不一样的概念。

ffmpeg常用的采样位数对应的格式如下所示:

enum AVSampleFormat {
    
    
	AV_SAMPLE_FMT_NONE = -1,
	AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
	AV_SAMPLE_FMT_S16,         ///< signed 16 bits
	AV_SAMPLE_FMT_S32,         ///< signed 32 bits
	AV_SAMPLE_FMT_FLT,         ///< float
	AV_SAMPLE_FMT_DBL,         ///< double

	AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
	AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
	AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
	AV_SAMPLE_FMT_FLTP,        ///< float, planar
	AV_SAMPLE_FMT_DBLP,        ///< double, planar
	AV_SAMPLE_FMT_S64,         ///< signed 64 bits
	AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

	AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

采样率

音频采样，是把声音从模拟信号转换为数字信号。采样率，就是每秒对声音进行采集的次数，同样也是所得的数字信号的每秒样本数。在对声音进行采样时，常用的采样率有：
8,000 Hz - 电话所用采样率, 对于人的说话已经足够
11,025 Hz - AM调幅广播所用采样率
22,050 Hz~24,000 Hz - FM调频广播所用采样率
32,000 Hz - miniDV 数码视频 camcorder、DAT (LP mode)所用采样率
44,100 Hz - 音频 CD, 也常用于 MPEG-1 音频（VCD, SVCD, MP3）所用采样率
47,250 Hz - 商用PCM录音机所用采样率
48,000 Hz - miniDV、数字电视、DVD、DAT、电影和专业音频所用的数字声音所用采样率
50,000 Hz - 商用数字录音机所用采样率
96,000 或者192,000 Hz - DVD-Audio、一些 LPCM DVD 音轨、BD-ROM（蓝光盘）音轨、和 HD-DVD （高清晰度 DVD）音轨所用所用采样率
2.8224 MHz - Direct Stream Digital 的 1 位 sigma-delta modulation 过程所用采样率。

采样越高，声音的还原就越真实越自然，人对频率的识别范围是20HZ - 20000HZ, 如果每秒钟能对声音做 20000 个采样, 回放时就足可以满足人耳的需求.所以 22050 的采样频率是常用的, 44100已是CD音质, 超过48000的采样对人耳已经没有意义。这和电影的每秒 24 帧图片的道理差不多。

声道布局

当人听到声音时，能对声源进行定位，那么通过在不同的位置设置声源，就可以造就出更好的听觉感受。常见的声道有:

单声道, mono
双声道, stereo, 最常见的类型，包含左声道以及右声道
2.1声道，在双声道基础上加入一个低音声道
5.1声道，包含一个正面声道、左前方声道、右前方声道、左环绕声道、右环绕声道、一个低音声道，最早应用于早期的电影院
7.1声道，在5.1声道的基础上，把左右的环绕声道拆分为左右环绕声道以及左右后置声道，主要应用于BD以及现代的电影院

码率

码率也就是每秒的传输速率(也叫比特率)，压缩的音频文件常用倍速来表示，比如达到CD音质的MP3是128kbps/44100HZ。注意这里的单位是bit而不是Byte,一个Byte等于8个bit(位),bit是最小的单位，一般用于网络速度的描述和各种通信速度，Byte则用于计算硬盘，内存的大小。

使用FFmpeg音频滤镜进行转码

不同领域对音频的播放要求是不一样的，所以需要针对不同的领域对音频参数进行调整，这里介绍一下如何通过音频滤镜调整音频数据的相关参数，对应的实现如下:

#include "../audio_filter.h"

extern "C" 
{
    
    
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/avutil.h>
#include <libavfilter/avfilter.h>
#include <libswresample/swresample.h>
}

#include <string>


/**@brief 转换音频数据的格式
* @param[in]  output_filename 输出文件名称
* @param[in]  input_filename 输入文件名称
* @param[in]  sample_fmt 采样格式
* @param[in]  sample_rate 采样率
* @param[in]  channel_layout 通道布局
* @param[in]  bitrate 码率
* @return  函数执行结果
* - 0     成功
* - 其它  失败
*/
int transcode_audio(const char *output_filename, const char *input_filename, AVSampleFormat sample_fmt,
	int sample_rate, uint64_t channel_layout, uint64_t bitrate) 
{
    
    
	//输入输出格式
	AVFormatContext *inFmtCtx = nullptr;
	AVFormatContext *outFmtCtx = nullptr;

	//解码器和编码器
	AVCodecContext *aDecCtx = nullptr;
	AVCodecContext *aEncCtx = nullptr;

	//输出流
	AVStream *aOutStream = nullptr;

	int ret;
	// open input file
	ret = avformat_open_input(&inFmtCtx, input_filename, nullptr, nullptr);
	ret = avformat_find_stream_info(inFmtCtx, nullptr);
	
	// open output file
	avformat_alloc_output_context2(&outFmtCtx, nullptr, nullptr, output_filename);
	for (int i = 0; i < inFmtCtx->nb_streams; ++i)
	{
    
    
		AVStream *inStream = inFmtCtx->streams[i];
		if (inStream->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
		{
    
    
			//输入流的解码器
			AVCodec *decoder = avcodec_find_decoder(inStream->codecpar->codec_id);
			aDecCtx = avcodec_alloc_context3(decoder);
			ret = avcodec_parameters_to_context(aDecCtx, inStream->codecpar);
			ret = avcodec_open2(aDecCtx, decoder, nullptr);

			//输出流的编码器
			AVCodec *encoder = avcodec_find_encoder(outFmtCtx->oformat->audio_codec);
			aOutStream = avformat_new_stream(outFmtCtx, encoder);
			aOutStream->id = outFmtCtx->nb_streams - 1;
			aEncCtx = avcodec_alloc_context3(encoder);

			aEncCtx->codec_id = encoder->id;
			aEncCtx->sample_fmt = sample_fmt ? sample_fmt : aDecCtx->sample_fmt;
			aEncCtx->sample_rate = sample_rate ? sample_rate : aDecCtx->sample_rate;
			aEncCtx->channel_layout = channel_layout;
			aEncCtx->channels = av_get_channel_layout_nb_channels(channel_layout);
			aEncCtx->bit_rate = bitrate ? bitrate : aDecCtx->bit_rate;
			aEncCtx->time_base = {
    
     1, aEncCtx->sample_rate };
			aOutStream->time_base = aEncCtx->time_base;
			if (outFmtCtx->oformat->flags & AVFMT_GLOBALHEADER)
				aEncCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
			avcodec_open2(aEncCtx, encoder, nullptr);

			ret = avcodec_parameters_from_context(aOutStream->codecpar, aEncCtx);
			av_dict_copy(&aOutStream->metadata, inStream->metadata, 0);
			break;
		}
	}


	if (!(outFmtCtx->oformat->flags & AVFMT_NOFILE)) 
	{
    
    
		ret = avio_open(&outFmtCtx->pb, output_filename, AVIO_FLAG_WRITE);
		if (ret < 0) {
    
    
			return -1;
		}
	}

	ret = avformat_write_header(outFmtCtx, nullptr);
	if (ret < 0) 
	{
    
    
		return -1;
	}

	AVFrame *inAudioFrame = av_frame_alloc();
	AVFrame *outAudioFrame = av_frame_alloc();

	outAudioFrame->format = aEncCtx->sample_fmt;
	outAudioFrame->sample_rate = aEncCtx->sample_rate;
	outAudioFrame->channel_layout = aEncCtx->channel_layout;
	outAudioFrame->nb_samples = aEncCtx->frame_size;
	ret = av_frame_get_buffer(outAudioFrame, 0);
	
	int64_t audio_pts = 0;

	//修改音频数据包格式的滤镜
	AudioFilter filter;
	char description[512];
	AudioConfig inConfig(aDecCtx->sample_fmt, aDecCtx->sample_rate, aDecCtx->channel_layout, aDecCtx->time_base);
	AudioConfig outConfig(aEncCtx->sample_fmt, aEncCtx->sample_rate, aEncCtx->channel_layout, aEncCtx->time_base);
	char ch_layout[64];
	av_get_channel_layout_string(ch_layout, sizeof(ch_layout),
		av_get_channel_layout_nb_channels(aEncCtx->channel_layout), aEncCtx->channel_layout);
	snprintf(description, sizeof(description),
		"[in]aresample=sample_rate=%d[res];[res]aformat=sample_fmts=%s:sample_rates=%d:channel_layouts=%s[out]",
		aEncCtx->sample_rate,
		av_get_sample_fmt_name(aEncCtx->sample_fmt),
		aEncCtx->sample_rate,
		ch_layout);
	filter.create(description, &inConfig, &outConfig);
	filter.dumpGraph();

	while (true) 
	{
    
    
		//解析音频帧并通过滤镜进行处理
		AVPacket inPacket{
    
     nullptr };
		av_init_packet(&inPacket);
		ret = av_read_frame(inFmtCtx, &inPacket);
		if (ret == AVERROR_EOF) 
		{
    
    
			break;
		}
		else if (ret < 0) 
		{
    
    
			return -1;
		}

		if (inPacket.stream_index == aOutStream->index) 
		{
    
    
			ret = avcodec_send_packet(aDecCtx, &inPacket);
			if (ret != 0) 
			{
    
    
				printf("send packet error\n");
			}
			ret = avcodec_receive_frame(aDecCtx, inAudioFrame);

			if (ret == 0) 
			{
    
    
				
				ret = filter.addInput1(inAudioFrame);
				av_frame_unref(inAudioFrame);
				if (ret < 0)
				{
    
    
					printf("add filter input1 error\n");
				}

				do {
    
    
					outAudioFrame->nb_samples = aEncCtx->frame_size;
					ret = filter.getFrame(outAudioFrame);
					if (ret == 0) 
					{
    
    
						outAudioFrame->pts = audio_pts;
						audio_pts += outAudioFrame->nb_samples;

						ret = avcodec_send_frame(aEncCtx, outAudioFrame);
						if (ret < 0) {
    
    
							printf("unable to send frame: %s\n");
						}
					}
					else 
					{
    
    
						printf("unable to get filter audio frame: %s\n");
						break;
					}

					do {
    
    
						AVPacket outPacket{
    
     nullptr };
						av_init_packet(&outPacket);
						ret = avcodec_receive_packet(aEncCtx, &outPacket);
						if (ret == 0) 
						{
    
    
							av_packet_rescale_ts(&outPacket, aEncCtx->time_base, aOutStream->time_base);
							outPacket.stream_index = aOutStream->index;
							
							ret = av_interleaved_write_frame(outFmtCtx, &outPacket);
							if (ret < 0) {
    
    
								printf("unable to write packet\n");
								break;
							}
						}
						else {
    
    
							printf("unable to receive packet\n");
							break;
						}
					} while (true);

				} while (true);

			}
			else 
			{
    
    
				printf("unable to receive frame\n");
			}
		}
	}

	//清理数据缓存
	int eof = 0;
	do {
    
    
		ret = filter.getFrame(outAudioFrame);
		if (ret == 0) 
		{
    
    
			outAudioFrame->pts = audio_pts;
			audio_pts += outAudioFrame->nb_samples;
		}
		else 
		{
    
    
			printf("filter queue finished\n");
		}

		ret = avcodec_send_frame(aEncCtx, ret == 0 ? outAudioFrame : nullptr);
		do {
    
    
			AVPacket outPacket{
    
     nullptr };
			ret = avcodec_receive_packet(aEncCtx, &outPacket);
			if (ret == 0) {
    
    
				av_packet_rescale_ts(&outPacket, aEncCtx->time_base, aOutStream->time_base);
				outPacket.stream_index = aOutStream->index;
				
				ret = av_interleaved_write_frame(outFmtCtx, &outPacket);
				if (ret < 0) 
				{
    
    
					eof = 1;
					break;
				}
			}
			else if (ret == AVERROR_EOF) 
			{
    
    
				eof = 1;
				break;
			}
			else {
    
    
				break;
			}
		} while (true);

	} while (!eof);

	//释放对应的资源
	filter.destroy();
	av_write_trailer(outFmtCtx);
	avformat_close_input(&inFmtCtx);
	av_frame_free(&inAudioFrame);
	av_frame_free(&outAudioFrame);
	avcodec_free_context(&aDecCtx);
	avcodec_free_context(&aEncCtx);
	avformat_free_context(inFmtCtx);
	avformat_free_context(outFmtCtx);

	return 0;
}

这里我们将音频文件的采样格式修改为AV_SAMPLE_FMT_FLTP，同时我们将采样率降低为22050,码率修改为80kbps。

int main(int argc, char* argv[])
{
    
    
	if (argc != 3)
	{
    
    
		printf("usage:%1 input filepath %2 outputfilepath");
		return -1;
	}

	//输入文件地址、输出文件地址
	std::string fileInput = std::string(argv[1]);
	std::string  fileOutput = std::string(argv[2]);
	transcode_audio(fileOutput.c_str(), fileInput.c_str(),(AVSampleFormat)AV_SAMPLE_FMT_FLTP,22050, AV_CH_LAYOUT_STEREO,80000);
}

参考链接

参考链接:https://www.cnblogs.com/yongdaimi/p/10722355.html