FFmpeg source code analysis: introduction to audio filters (below)

FFmpeg provides audio and video filters in the libavfilter module. All audio filters are registered in libavfilter/allfilters.c. We can also use the ffmpeg -filters command line to view all currently supported filters, where -a stands for audio. This article mainly introduces audio filters, including: mixing, silence fill, Haas effect, chorus effect, equalizer, iir and fir filters, low-pass filter, band-pass filter, high-pass filter, variable-speed pitch, Volume adjustment, mute detection.

For a detailed introduction to audio filters, see the official documentation: Audio Filters. For the audio filters in the upper part, see: Introduction to Audio Filters (Part 1) .

1 、 amerge

Merge, combine two or more audio streams into a multi-channel output stream. If the input channel layout is interleaved, the output channel layout is set accordingly and the channels are reordered as needed. Conversely, if the input channel layout is not interleaved, then all channels of the first input stream come first, then the second input stream, and so on.

For example, to merge two audio streams, the reference command is as follows:

ffmpeg -i one.mp3 -i two.mp3 -filter_complex [0:a][1:a]amerge=inputs=2[aout] -map [aout] out.mp3

2、amix

Mixing, mixes all input audio streams into a single audio stream. This filter only supports floating point sample formats. If it is an integer sampling format, it will be automatically converted to a floating-point sampling format. The parameter options are as follows:

inputs: the number of input audio, the default is 2
duration: The duration of the audio after mixing
longest: use the longest input stream duration (default)
shortest: use the shortest input stream duration
first: use the first input stream duration
dropout_transition: transition time, default is 2 seconds
weights: the weight of each audio stream, all audio streams have the same weight by default
normalize: whether to enable normalization, it is enabled by default

The mixing code is located in libavfilter/af_amix.c, and the core code is as follows:

// 从FIFO队列读取若干采样数据，然后混音，写到输出缓冲区
static int output_frame(AVFilterLink *outlink)
{
    AVFilterContext *ctx = outlink->src;
    MixContext      *s = ctx->priv;
    AVFrame *out_buf, *in_buf;
    int nb_samples, ns, i;

    if (s->input_state[0] & INPUT_ON) {
        nb_samples = frame_list_next_frame_size(s->frame_list);
        for (i = 1; i < s->nb_inputs; i++) {
            if (s->input_state[i] & INPUT_ON) {
                ns = av_audio_fifo_size(s->fifos[i]);
                if (ns < nb_samples) {
                    if (!(s->input_state[i] & INPUT_EOF))
                        return 0;
                    nb_samples = ns;
                }
            }
        }

        s->next_pts = frame_list_next_pts(s->frame_list);
    } else {
        nb_samples = INT_MAX;
        for (i = 1; i < s->nb_inputs; i++) {
            if (s->input_state[i] & INPUT_ON) {
                ns = av_audio_fifo_size(s->fifos[i]);
                nb_samples = FFMIN(nb_samples, ns);
            }
        }
        if (nb_samples == INT_MAX) {
            ff_outlink_set_status(outlink, AVERROR_EOF, s->next_pts);
            return 0;
        }
    }

    frame_list_remove_samples(s->frame_list, nb_samples);
    calculate_scales(s, nb_samples);
    if (nb_samples == 0)
        return 0;
    out_buf = ff_get_audio_buffer(outlink, nb_samples);
    if (!out_buf)
        return AVERROR(ENOMEM);
    in_buf = ff_get_audio_buffer(outlink, nb_samples);
    if (!in_buf) {
        av_frame_free(&out_buf);
        return AVERROR(ENOMEM);
    }

    for (i = 0; i < s->nb_inputs; i++) {
        if (s->input_state[i] & INPUT_ON) {
            int planes, plane_size, p;
            // 从FIFO队列读取采样数据
            av_audio_fifo_read(s->fifos[i], (void **)in_buf->extended_data,
                               nb_samples);

            planes     = s->planar ? s->nb_channels : 1;
            plane_size = nb_samples * (s->planar ? 1 : s->nb_channels);
            plane_size = FFALIGN(plane_size, 16);
            // 开始混音，判断是float类型还是double类型
            if (out_buf->format == AV_SAMPLE_FMT_FLT ||
                out_buf->format == AV_SAMPLE_FMT_FLTP) {
                for (p = 0; p < planes; p++) {
                    s->fdsp->vector_fmac_scalar((float *)out_buf->extended_data[p],
                                                (float *) in_buf->extended_data[p],
                                                s->input_scale[i], plane_size);
                }
            } else {
                for (p = 0; p < planes; p++) {
                    s->fdsp->vector_dmac_scalar((double *)out_buf->extended_data[p],
                                                (double *) in_buf->extended_data[p],
                                                s->input_scale[i], plane_size);
                }
            }
        }
    }
    av_frame_free(&in_buf);

    out_buf->pts = s->next_pts;
    if (s->next_pts != AV_NOPTS_VALUE)
        s->next_pts += nb_samples;

    return ff_filter_frame(outlink, out_buf);
}

When mixing, if the floating point data calls the vector_fmac_scalar function pointer, a member variable of the AVFloatDSPContext structure from libavutil/float_dsp.h. Multiply the source data by a scalar and add to the destination vector:

    /**
     * Multiply a vector of floats by a scalar float and add to
     * destination vector.  Source and destination vectors must
     * overlap exactly or not at all.
     *
     * @param dst result vector
     *            constraints: 32-byte aligned
     * @param src input vector
     *            constraints: 32-byte aligned
     * @param mul scalar value
     * @param len length of vector
     *            constraints: multiple of 16
     */
    void (*vector_fmac_scalar)(float *dst, const float *src, float mul, int len);

In the float_dsp.c source code, avpriv_float_dsp_alloc() assigns the function pointer:

av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact)
{
    AVFloatDSPContext *fdsp = av_mallocz(sizeof(AVFloatDSPContext));
    if (!fdsp)
        return NULL;

    fdsp->vector_fmac_scalar = vector_fmac_scalar_c;
    fdsp->vector_fmul_scalar = vector_fmul_scalar_c;
    return fdsp;
}

Let's see the implementation of the vector_fmac_scalar_c() method:

static void vector_fmac_scalar_c(float *dst, const float *src, float mul,
                                 int len)
{
    int i;
    for (i = 0; i < len; i++)
        dst[i] += src[i] * mul;
}

3、apad

Padding, pads the end of the audio stream to silence. The parameter options are as follows:

packet_size: mute packet size, default is 4096
pad_len: the number of samples to pad as silence
whole_len: specifies the minimum number of output samples
pad_dur: Specify the padding duration
whole_dur: specify the minimum output duration

4、atempo

Variable speed to adjust the audio playback speed. Accepts only one parameter atempo, the value range is [0.5, 100.0], the default is 1.0. Note that if atempo is greater than 2, some sampled data will be skipped.

The audio is changed to 2x speed, the reference command is as follows:

ffmpeg -i in.mp3 -filter_complex atempo=2.0 out.mp3

5、chorus

Chorus, adds a chorus effect to the audio stream. Chorus is similar to an echo effect with a short delay, but the delay of the echo is constant, whereas chorus uses sine or triangle wave modulation to vary the delay. Therefore, the delayed sound will sound slower or faster, i.e. the delayed sound will be tuned around the original sound. The parameter options are as follows:

in_gain: input gain, default is 0.4
out_gain: output gain, default is 0.4
delays: delay time, generally 40ms to 60ms
decays: decay coefficient
speeds: set the speed
depths: set the depth

6、haas

Haas, applies the Haas effect to sound effects. Note that this mainly works on mono signals. When this filter is applied to a mono signal, it provides a sense of direction and is converted to stereo. The parameter options are as follows:

level_in: input level, default is 1
level_out: output level, default is 1
side_gain: side gain, default is 1
middle_source: Middle sound source type, including the following parameters:
'left': select left channel
'right': select the right channel
'mid': middle stereo channel
'side': side stereo channel
middle_phase: whether to change the middle phase, off by default
left_delay: left channel delay, default is 2.05 ms
left_balance: left channel balance, the default is -1
left_gain: left channel gain, default is 1
left_phase: change the left phase, off by default
right_delay: right channel delay, default is 2.12 ms
right_balance: right channel balance, default is 1
right_gain: Right channel gain, default is 1
right_phase: Change the phase of the right channel, enabled by default

7、silencedetect

Silence detection, which detects the silent part of the audio stream. When the filter detects that the input audio volume is less than or equal to the noise tolerance value, and the duration is greater than or equal to the detected minimum noise duration, it is considered as mute. The parameter options are as follows:

noise, n: noise tolerance value, in dB, default is -60dB
duration, d: Set the mute duration, the default is 2s
mono, m: process each channel independently, default off

The reference command to detect silence is as follows:

ffmpeg -i hello.mp3 -af silencedetect=noise=0.0001 -f null -

The output result starts with silence_start, ends with silence_end, and silence_duraition is the silence duration:

[silencedetect @ 0000020c67936fc0] silence_start: 268.82
[silencedetect @ 0000020c67936fc0] silence_end: 271.048 | silence_duration: 2.22796