FFmpeg audio resampling

1. Audio resampling

1.1 What is resampling

The so-called resampling is to change the audio sampling rate, sample format, number of channels and other parameters to make it output according to the parameters we expect.

1.2 Why resampling

The original audio parameters do not meet our needs. For example, when FFmpeg decodes audio, different audio sources have different formats, sampling rates, etc., and these parameters in the decoded data will also be inconsistent ( After the latest FFmpeg decodes the audio, the audio format is AV_SAMPLE_FMT_FLTP, this parameter should be consistent), if we need to use the decoded audio data for other operations, and the inconsistency of these parameters will cause many It is much more convenient to directly resample it to obtain the audio parameters we have formulated.

Another example is when playing audio in SDL, because the current SDL2.0 does not support planar format, nor does it support floating-point type , and the latest FFMPEG 16 will decode audio into AV_SAMPLE_FMT_FLTP format, so At this point, we need to resample it so that it can be played on SDL2.0.

1.3 Adjustable parameters

By resampling, we can:

  • sample rate (sample rate)
  • sample format
  • channel layout (channel layout, you can get the number of channels through this parameter

2. Analysis of audio resampling parameters

2.1. Sampling rate

The number of times the sampling device draws samples per second nb_samples

2.2. Sampling format (bit width)

Each audio format has a different quantization precision (bit width). The more bits there are, the more accurate the representation value will be, and the more accurate the sound performance will be. There are the following audio formats in FFMpeg, and each format has information about the number of bytes it occupies (libavutil/samplefmt.h):

enum AVSampleFormat {
    
    
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

The format ending in P is flat format, otherwise it is interleaved format.

2.3 Fragmentation (plane) and packaging (packed)

Taking two-channel as an example, when the data format with P (plane) is stored, the data of the left channel and the right channel are stored separately , the data of the left channel is stored in data[0], and the data of the right channel The data is stored in data[1], and the number of bytes occupied by each channel is linesize[0] and linesize[1];

When the audio data without P (packed) is stored, it is alternately stored in data[0]** according to the format of LRLRLR..., and linesize[0] indicates the total amount of data.

2.4 channel distribution (channel_layout)

Channel distribution is defined in FFmpeg\libavutil\channel_layout.h. Generally speaking, AV_CH_LAYOUT_STEREO (two-channel) and AV_CH_LAYOUT_SURROUND (three-channel) are defined as follows:

#define AV_CH_LAYOUT_MONO              (AV_CH_FRONT_CENTER)
#define AV_CH_LAYOUT_STEREO            (AV_CH_FRONT_LEFT|AV_CH_FRONT_RIGHT)
#define AV_CH_LAYOUT_2POINT1           (AV_CH_LAYOUT_STEREO|AV_CH_LOW_FREQUENCY)
#define AV_CH_LAYOUT_2_1               (AV_CH_LAYOUT_STEREO|AV_CH_BACK_CENTER)
#define AV_CH_LAYOUT_SURROUND          (AV_CH_LAYOUT_STEREO|AV_CH_FRONT_CENTER)
#define AV_CH_LAYOUT_3POINT1           (AV_CH_LAYOUT_SURROUND|AV_CH_LOW_FREQUENCY)
#define AV_CH_LAYOUT_4POINT0           (AV_CH_LAYOUT_SURROUND|AV_CH_BACK_CENTER)
#define AV_CH_LAYOUT_4POINT1           (AV_CH_LAYOUT_4POINT0|AV_CH_LOW_FREQUENCY)
...
2.5 Calculation of data volume of audio frame

Data volume (bytes) of a frame of audio = number of channels * number of nb_samples samples * number of bytes occupied by each sample

If the audio frame is PCM data in FLTP format, including 1024 samples, two-channel, then the amount of audio data contained in the audio frame is 2x1024x4=8192 bytes.

If the audio frame is PCM data in AV_SAMPLE_FMT_DBL format: 2x1024x8 = 16384 bytes.

2.6 Audio playback time calculation

Calculation formula: One frame playback time (milliseconds) = nb_samples*1000/sampling rate

Calculated at a sampling rate of 44100Hz, there are 44100 samples per second, and a normal frame is 1024 samples. It can be known that the playback time of each frame/1024=1000ms/44100, and the playback time of each frame=1024*1000/44100=23.2ms (accurate The value is 23.21995464852608).

If the sampling rate is 48000Hz 1024*1000/48000=21.33333333333333ms

Note: 1024*1000/44100=23.21995464852608ms -> approximately equal to 23.2ms, the accuracy loss is 0.011995464852608ms , if the accumulated 100,000 frames, the error>1199ms, if there is a video together, there will be audio and video synchronization problems. If you calculate pts (0 23.2 46.4) according to 23.2, there will be a cumulative error

3. FFmpeg resampling related API

  • Allocate the context of audio resampling: struct SwrContext *swr_alloc(void);

  • After setting the relevant parameters, use this function to initialize the SwrContext structure: int swr_init(struct SwrContext *s);

  • Allocate SwrContext and set/reset commonly used parameters.

struct SwrContext *swr_alloc_set_opts(struct SwrContext *s, // ⾳频重采样上下⽂ 
                                      int64_t out_ch_layout, // 输出的layout, 如:5.1声道 
                                      enum AVSampleFormat out_sample_fmt, // 输出的采样格式。Float, S16,⼀般 选⽤是s16 绝⼤部分声卡⽀持 
                                      int out_sample_rate, //输出采样率 
                                      int64_t in_ch_layout, // 输⼊的layout 
                                      enum AVSampleFormat in_sample_fmt, // 输⼊的采样格式 
                                      int in_sample_rate, // 输⼊的采样率 
                                      int log_offset, // ⽇志相关,不⽤管先,直接为0 
                                      void *log_ctx // ⽇志相关,不⽤管先,直接为NULL );
  • Convert the input audio according to the defined parameters and output
int swr_convert(struct SwrContext *s, //⾳频重采样的上下⽂ 
                uint8_t **out, //输出的指针。传递的输出的数组 
                int out_count, //输出的样本数量,不是字节数。单通道的样本数量。 
                const uint8_t **in , //输⼊的数组,AVFrame解码出来的DATA 
                int in_count //输⼊的单通道的样本数量。 );
 //in和in_count可以设置为0,以最后刷新最后⼏个样本。
  • Release the SwrContext structure and set this structure to NULL: void swr_free(struct SwrContext **s);

In order to use lswr, the first thing you need to do is allocate a SwrContext. This can be done using swr_alloc() or swr_alloc_set_opts().

To use swr_alloc(), options must be set through the AVOptions API

Using swr_alloc_set_opts() provides the same functionality, but it allows you to set some common options in the same statement

For example, the following code will set up conversion from planar float sample format to interleaved signed 16-bit integers, downsampling from 48kHz to 44.1kHz, and downmixing from 5.1 channels to stereo (using the default mixing matrix ).

Use the swr_alloc() function:

SwrContext *swr = swr_alloc();
av_opt_set_channel_layout(swr, "in_channel_layout", AV_CH_LAYOUT_ 5POINT1, 0); av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_ STEREO, 0);
av_opt_set_int(swr, "in_sample_rate", 48000, 0) ;
av_opt_set_int(swr, "out_sample_rate", 44100, 0) ; 
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0 ); 
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0 );

The same works also with swr_alloc_set_opts():

SwrContext *swr = swr_alloc_set_opts(NULL, // we're allocating a new context 
                                        AV_CH_LAYOUT_STEREO, // out_ch_layout 
                                       AV_SAMPLE_FMT_S16, // out_sample_fmt 
                                       44100, // out_sample_rate 
                                       AV_CH_LAYOUT_5POINT1, // in_ch_layout 
                                       AV_SAMPLE_FMT_FLTP, // in_sample_fmt
                                       48000, // in_sample_rate 
                                       0, // log_offset 
                                       NULL); // log_ctx

Once all values ​​are set, it must be initialized with swr_init(). If you need to change the conversion parameters, you can use AVOptions to change the parameters, and the conversion itself is done by calling swr_convert() repeatedly.

NOTE: Samples may be buffered in the swr if insufficient output space is provided or after sample rate conversion is done, which requires "future" samples. Samples that do not require future input can be retrieved at any time by using swr_convert (in_count can be set to 0). At the end of the conversion, the resampling buffer can be flushed by calling swr_convert() with NULL in and in incount.

4. Audio resampling project

FFMpeg's own resample example: FFmpeg\doc\examples\resampling_audio.c, when using it in a project, pay attention to the various parameters set, and the given input data must be accurate:

#include "libswresample/swresample.h"
#include "libavutil/samplefmt.h"
#include "libavutil/channel_layout.h"
#include "libavutil/opt.h"

static int get_format_from_sample_fmt(const char **fmt,
                                      enum AVSampleFormat sample_fmt)
{
    
    
    int i;
    struct sample_fmt_entry {
    
    
        enum AVSampleFormat sample_fmt;
        const char *fmt_be, *fmt_le;
    } sample_fmt_entries[] = {
    
    
        {
    
    AV_SAMPLE_FMT_U8, "u8", "u8"},
        {
    
    AV_SAMPLE_FMT_S16, "s16be", "s16le"},
        {
    
    AV_SAMPLE_FMT_S32, "s32be", "s32le"},
        {
    
    AV_SAMPLE_FMT_FLT, "f32be", "f32le"},
        {
    
    AV_SAMPLE_FMT_DBL, "f64be", "f64le"},
    };
    *fmt = NULL;

    for (i = 0; i < FF_ARRAY_ELEMS(sample_fmt_entries); i++) {
    
    
        struct sample_fmt_entry *entry = &sample_fmt_entries[i];
        if (sample_fmt == entry->sample_fmt) {
    
    
            *fmt = AV_NE(entry->fmt_be, entry->fmt_le);
            return 0;
        }
    }

    fprintf(stderr, "Sample format %s not supported as output format\n",
            av_get_sample_fmt_name(sample_fmt));
    return AVERROR(EINVAL);
}

/**
 * Fill dst buffer with nb_samples, generated starting from t. 交错模式的
 */
static void fill_samples(double *dst, int nb_samples, int nb_channels,
                         int sample_rate, double *t)
{
    
    
    int i, j;
    double tincr = 1.0 / sample_rate, *dstp = dst;
    const double c = 2 * M_PI * 440.0;

    /* generate sin tone with 440Hz frequency and duplicated channels */
    for (i = 0; i < nb_samples; i++) {
    
    
        *dstp = sin(c * *t);
        for (j = 1; j < nb_channels; j++) dstp[j] = dstp[0];
        dstp += nb_channels;
        *t += tincr;
    }
}

int main(int argc, char **argv)
{
    
    
    // 输入参数
    int64_t src_ch_layout = AV_CH_LAYOUT_STEREO;
    int src_rate = 48000;
    enum AVSampleFormat src_sample_fmt = AV_SAMPLE_FMT_DBL;
    int src_nb_channels = 0;
    uint8_t **src_data = NULL;  // 二级指针
    int src_linesize;
    int src_nb_samples = 1024;

    // 输出参数
    int64_t dst_ch_layout = AV_CH_LAYOUT_STEREO;
    int dst_rate = 44100;
    enum AVSampleFormat dst_sample_fmt = AV_SAMPLE_FMT_S16;
    int dst_nb_channels = 0;
    uint8_t **dst_data = NULL;  //二级指针
    int dst_linesize;
    int dst_nb_samples;
    int max_dst_nb_samples;

    // 输出文件
    const char *dst_filename = NULL;  // 保存输出的pcm到本地,然后播放验证
    FILE *dst_file;

    int dst_bufsize;
    const char *fmt;

    // 重采样实例
    struct SwrContext *swr_ctx;

    double t;
    int ret;

    if (argc != 2) {
    
    
        fprintf(stderr,
                "Usage: %s output_file\n"
                "API example program to show how to resample an audio stream "
                "with libswresample.\n"
                "This program generates a series of audio frames, resamples "
                "them to a specified "
                "output format and rate and saves them to an output file named "
                "output_file.\n",
                argv[0]);
        exit(1);
    }
    dst_filename = argv[1];

    dst_file = fopen(dst_filename, "wb");
    if (!dst_file) {
    
    
        fprintf(stderr, "Could not open destination file %s\n", dst_filename);
        exit(1);
    }

    // 创建重采样器
    /* create resampler context */
    swr_ctx = swr_alloc();
    if (!swr_ctx) {
    
    
        fprintf(stderr, "Could not allocate resampler context\n");
        ret = AVERROR(ENOMEM);
        goto end;
    }

    // 设置重采样参数
    /* set options */
    // 输入参数
    av_opt_set_int(swr_ctx, "in_channel_layout", src_ch_layout, 0);
    av_opt_set_int(swr_ctx, "in_sample_rate", src_rate, 0);
    av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", src_sample_fmt, 0);
    // 输出参数
    av_opt_set_int(swr_ctx, "out_channel_layout", dst_ch_layout, 0);
    av_opt_set_int(swr_ctx, "out_sample_rate", dst_rate, 0);
    av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", dst_sample_fmt, 0);

    // 初始化重采样
    /* initialize the resampling context */
    if ((ret = swr_init(swr_ctx)) < 0) {
    
    
        fprintf(stderr, "Failed to initialize the resampling context\n");
        goto end;
    }

    /* allocate source and destination samples buffers */
    // 计算出输入源的通道数量
    src_nb_channels = av_get_channel_layout_nb_channels(src_ch_layout);
    // 给输入源分配内存空间
    ret = av_samples_alloc_array_and_samples(&src_data, &src_linesize,
                                             src_nb_channels, src_nb_samples,
                                             src_sample_fmt, 0);
    if (ret < 0) {
    
    
        fprintf(stderr, "Could not allocate source samples\n");
        goto end;
    }

    /* compute the number of converted samples: buffering is avoided
     * ensuring that the output buffer will contain at least all the
     * converted input samples */
    // 计算输出采样数量
    max_dst_nb_samples = dst_nb_samples =
        av_rescale_rnd(src_nb_samples, dst_rate, src_rate, AV_ROUND_UP);

    /* buffer is going to be directly written to a rawaudio file, no alignment
     */
    dst_nb_channels = av_get_channel_layout_nb_channels(dst_ch_layout);
    // 分配输出缓存内存
    ret = av_samples_alloc_array_and_samples(&dst_data, &dst_linesize,
                                             dst_nb_channels, dst_nb_samples,
                                             dst_sample_fmt, 0);
    if (ret < 0) {
    
    
        fprintf(stderr, "Could not allocate destination samples\n");
        goto end;
    }

    t = 0;
    do {
    
    
        /* generate synthetic audio */
        // 生成输入源
        fill_samples((double *)src_data[0], src_nb_samples, src_nb_channels,
                     src_rate, &t);

        /* compute destination number of samples */
        int64_t delay = swr_get_delay(swr_ctx, src_rate);
        dst_nb_samples = av_rescale_rnd(delay + src_nb_samples, dst_rate,
                                        src_rate, AV_ROUND_UP);
        if (dst_nb_samples > max_dst_nb_samples) {
    
    
            av_freep(&dst_data[0]);
            ret = av_samples_alloc(dst_data, &dst_linesize, dst_nb_channels,
                                   dst_nb_samples, dst_sample_fmt, 1);
            if (ret < 0) break;
            max_dst_nb_samples = dst_nb_samples;
        }
        //        int fifo_size = swr_get_out_samples(swr_ctx,src_nb_samples);
        //        printf("fifo_size:%d\n", fifo_size);
        //        if(fifo_size < 1024)
        //            continue;

        /* convert to destination format */
        //        ret = swr_convert(swr_ctx, dst_data, dst_nb_samples, (const
        //        uint8_t **)src_data, src_nb_samples);
        ret = swr_convert(swr_ctx, dst_data, dst_nb_samples,
                          (const uint8_t **)src_data, src_nb_samples);
        if (ret < 0) {
    
    
            fprintf(stderr, "Error while converting\n");
            goto end;
        }
        dst_bufsize = av_samples_get_buffer_size(&dst_linesize, dst_nb_channels,
                                                 ret, dst_sample_fmt, 1);
        if (dst_bufsize < 0) {
    
    
            fprintf(stderr, "Could not get sample buffer size\n");
            goto end;
        }
        printf("t:%f in:%d out:%d\n", t, src_nb_samples, ret);
        fwrite(dst_data[0], 1, dst_bufsize, dst_file);
    } while (t < 10);

    ret = swr_convert(swr_ctx, dst_data, dst_nb_samples, NULL, 0);
    if (ret < 0) {
    
    
        fprintf(stderr, "Error while converting\n");
        goto end;
    }
    dst_bufsize = av_samples_get_buffer_size(&dst_linesize, dst_nb_channels,
                                             ret, dst_sample_fmt, 1);
    if (dst_bufsize < 0) {
    
    
        fprintf(stderr, "Could not get sample buffer size\n");
        goto end;
    }
    printf("flush in:%d out:%d\n", 0, ret);
    fwrite(dst_data[0], 1, dst_bufsize, dst_file);

    if ((ret = get_format_from_sample_fmt(&fmt, dst_sample_fmt)) < 0) goto end;
    fprintf(stderr,
            "Resampling succeeded. Play the output file with the command:\n"
            "ffplay -f %s -channel_layout %" PRId64 " -channels %d -ar %d %s\n",
            fmt, dst_ch_layout, dst_nb_channels, dst_rate, dst_filename);

end:
    fclose(dst_file);

    if (src_data) av_freep(&src_data[0]);
    av_freep(&src_data);

    if (dst_data) av_freep(&dst_data[0]);
    av_freep(&dst_data);

    swr_free(&swr_ctx);
    return ret < 0;
}

おすすめ

転載: blog.csdn.net/u014078003/article/details/128842455