FFmpeg Source Code Analysis: Introduction to Audio Filters (Part 1)

FFmpeg provides audio and video filters in the libavfilter module. All audio filters are registered in libavfilter/allfilters.c. We can also use the ffmpeg -filters command line to view all currently supported filters, where -a stands for audio. This article mainly introduces audio filters, including: compressor, fade, remove noise, delay, echo, noise gate.

For a detailed introduction to audio filters, see the official documentation: Audio Filters .

1、acompressor

Compressor, mainly used to reduce the dynamic range of the signal. Especially modern music, mostly with high compression ratio, improve the overall loudness. Compression works by detecting that the signal exceeds a set threshold and dividing it by a scaling factor. The parameter options are as follows:

level_in: input gain, default is 1, range [0.015625, 64]
mode: compression mode, there is a upward和downward两种模式， default ofdownward
threshold: If the media stream signal reaches this threshold, it will cause the gain to decrease. Default is 0.125, range [0.00097563, 1]
ratio: scale factor for signal compression, default is 2, range [1, 20]
attack: the number of milliseconds it takes for the signal to rise to the threshold, default is 20, range [0.01, 2000]
release: the number of milliseconds it takes for the signal to drop to the threshold, default is 250, range [0.01, 9000]
makeup: how much signal is amplified after processing. Default is 1, range [1, 64]
knee: order of gain reduction, default is 2.82843, range [1, 8]
averagelink: The sum of the signal attenuation maximum两种模式， defaults toaverage
detection: use the peak peak signal or the rms root mean square signal, the default uses a smoother rms
mix: how much compressed signal to use when outputting, default is 1, range [0, 1]

2、acrossfade

Fade effect, which is applied to the transition from one audio stream to another. The parameter options are as follows:

nb_samples, ns: specifies the number of samples for the fade effect, the default is 44100
duration, d: specifies the duration of the fade
overlap, o: Whether the end of the first stream is seamless with the second stream, enabled by default
curve1: Set the transition curve for the first stream to fade in and out
curve2: Set the transition curve for the second stream to fade in and out

The reference command is as follows:

ffmpeg -i first.flac -i second.flac -filter_complex acrossfade=d=10:c1=exp:c2=exp output.flac

3 afade

Fade in and out effect, similar to the acrossfade effect. The parameter list is as follows:

type, t: the effect type in or out, the default is in

start_sample, ss: the number of samples to start, the default is 0

nb_samples, ns: the number of samples to fade in and out, the default is 44100

start_time, st: start time, default is 0

duration, d: duration of fade effect

curve: The transition curve for fading in and out, including the following options:

tri: triangle, defaults to linear slope
qsin: quarter sine wave
hsin: one-half sine wave
esin: exponential sine wave
log: logarithm
ipar: inverse parabola
qua: quadratic interpolation
cub: cubic interpolation
squ: square root
cbr: cube root
par: parabola
exp: exponent
iqsin: inverse quarter sine wave
ihsin: inverse half sine wave
dese: double exponential
desi: double exponential curve
losi: regression curve
sinc: sine base function
isinc: arcsine base function
nofade: no fade in and out

Similarly, macro definitions are used to set fade effects for different sampling formats. The code is located in af_afade.c, which is divided into two forms: FADE_PLANAR (plane storage) and FADE (interleaved storage):

#define FADE_PLANAR(name, type)                                             \
static void fade_samples_## name ##p(uint8_t **dst, uint8_t * const *src,   \
                                     int nb_samples, int channels, int dir, \
                                     int64_t start, int64_t range, int curve) \
{                                                                           \
    int i, c;                                                               \
                                                                            \
    for (i = 0; i < nb_samples; i++) {                                      \
        double gain = fade_gain(curve, start + i * dir, range);             \
        for (c = 0; c < channels; c++) {                                    \
            type *d = (type *)dst[c];                                       \
            const type *s = (type *)src[c];                                 \
                                                                            \
            d[i] = s[i] * gain;                                             \
        }                                                                   \
    }                                                                       \
}

#define FADE(name, type)                                                    \
static void fade_samples_## name (uint8_t **dst, uint8_t * const *src,      \
                                  int nb_samples, int channels, int dir,    \
                                  int64_t start, int64_t range, int curve)  \
{                                                                           \
    type *d = (type *)dst[0];                                               \
    const type *s = (type *)src[0];                                         \
    int i, c, k = 0;                                                        \
                                                                            \
    for (i = 0; i < nb_samples; i++) {                                      \
        double gain = fade_gain(curve, start + i * dir, range);             \
        for (c = 0; c < channels; c++, k++)                                 \
            d[k] = s[k] * gain;                                             \
    }                                                                       \
}

4、adeclick

Removes impulse noise from the input signal. Replace samples detected as impulse noise with interpolated samples using an autoregressive model. The parameter options are as follows:

window, w: Set the size of the window function, in ms. Default is 55, range [10, 100]
overlap, o: Set the window overlap ratio, the default is 75, the range is [50, 95]
arorder, a: set the autoregressive order, default 2, range [0, 25]
threshold, t: set the threshold, the default is 2, the range [1, 100]
burst, b: set the fusion coefficient, the default is 2, the range is [0, 10]
method, m: set the overlapping method, which can be add, a or save, s

5 、 adelay

Delay effect, the delayed samples of the channel are filled with silence. The code is located in libavfilter/af_adelay.c, and uses macro definitions to fill the corresponding sampling format with silence. If the u8 type is filled with 0x80, and other types are filled with 0x00, the core code is as follows:

#define DELAY(name, type, fill)                                           \
static void delay_channel_## name ##p(ChanDelay *d, int nb_samples,       \
                                      const uint8_t *ssrc, uint8_t *ddst) \
{                                                                         \
    const type *src = (type *)ssrc;                                       \
    type *dst = (type *)ddst;                                             \
    type *samples = (type *)d->samples;                                   \
                                                                          \
    while (nb_samples) {                                                  \
        if (d->delay_index < d->delay) {                                  \
            const int len = FFMIN(nb_samples, d->delay - d->delay_index); \
                                                                          \
            memcpy(&samples[d->delay_index], src, len * sizeof(type));    \
            memset(dst, fill, len * sizeof(type));                        \
            d->delay_index += len;                                        \
            src += len;                                                   \
            dst += len;                                                   \
            nb_samples -= len;                                            \
        } else {                                                          \
            *dst = samples[d->index];                                     \
            samples[d->index] = *src;                                     \
            nb_samples--;                                                 \
            d->index++;                                                   \
            src++, dst++;                                                 \
            d->index = d->index >= d->delay ? 0 : d->index;               \
        }                                                                 \
    }                                                                     \
}

DELAY(u8,  uint8_t, 0x80)
DELAY(s16, int16_t, 0)
DELAY(s32, int32_t, 0)
DELAY(flt, float,   0)
DELAY(dbl, double,  0)

6 、 aecho

Echo effect, adds echo to the audio stream. Echoes are reflected sounds, which are natural reflections in mountains or rooms. Digital echo signals can simulate this effect by adjusting the delay time and attenuation coefficient of the original sound and the reflected sound. The original sound is also called dry sound, and the reflection is called wet sound. The parameter options are as follows:

in_gain: the input gain of the reflected sound, the default is 0.6
out_gain: the output gain of the reflected sound, the default is0.3
delays: the delay interval of each reflected sound, separated by '|', the default is 1000, the range is(0, 90000.0]
decays: the attenuation coefficient of each reflected sound, separated by '|', the default is 0, the range is(0, 1.0]

For example, to simulate the echo in the mountains, the reference command is as follows:

aecho=0.8:0.9:1000:0.3

The code is located in af_aecho.c, which uses macro definitions to set echoes in different sampling formats:

#define ECHO(name, type, min, max)                                          \
static void echo_samples_## name ##p(AudioEchoContext *ctx,                 \
                                     uint8_t **delayptrs,                   \
                                     uint8_t * const *src, uint8_t **dst,   \
                                     int nb_samples, int channels)          \
{                                                                           \
    const double out_gain = ctx->out_gain;                                  \
    const double in_gain = ctx->in_gain;                                    \
    const int nb_echoes = ctx->nb_echoes;                                   \
    const int max_samples = ctx->max_samples;                               \
    int i, j, chan, av_uninit(index);                                       \
                                                                            \
    av_assert1(channels > 0); /* would corrupt delay_index */               \
                                                                            \
    for (chan = 0; chan < channels; chan++) {                               \
        const type *s = (type *)src[chan];                                  \
        type *d = (type *)dst[chan];                                        \
        type *dbuf = (type *)delayptrs[chan];                               \
                                                                            \
        index = ctx->delay_index;                                           \
        for (i = 0; i < nb_samples; i++, s++, d++) {                        \
            double out, in;                                                 \
                                                                            \
            in = *s;                                                        \
            out = in * in_gain;                                             \
            for (j = 0; j < nb_echoes; j++) {                               \
                int ix = index + max_samples - ctx->samples[j];             \
                ix = MOD(ix, max_samples);                                  \
                out += dbuf[ix] * ctx->decay[j];                            \
            }                                                               \
            out *= out_gain;                                                \
                                                                            \
            *d = av_clipd(out, min, max);                                   \
            dbuf[index] = in;                                               \
                                                                            \
            index = MOD(index + 1, max_samples);                            \
        }                                                                   \
    }                                                                       \
    ctx->delay_index = index;                                               \
}

ECHO(dbl, double,  -1.0,      1.0      )
ECHO(flt, float,   -1.0,      1.0      )
ECHO(s16, int16_t, INT16_MIN, INT16_MAX)
ECHO(s32, int32_t, INT32_MIN, INT32_MAX)

7、agate

Noise gates are used to reduce low-frequency signals and remove interfering noise from useful signals. By detecting the signal below the threshold, divide it by the set scale factor. The parameter options are as follows:

level_in: input level, default is 0, range [0.015625, 64]
mode: the operation mode upward或downward., the default isdownward
range: gain attenuation range, default is 0.06125, range [0, 1]
threshold: threshold for gain boost, default is 0.125, range [0, 1]
ratio: scale factor for gain reduction, default is 2, range [1, 9000]
attack: signal amplification time, default is 20ms, range [0.01, 9000]
release: signal decay time, default is 250ms., range [0.01, 9000]
makeup: signal amplification factor, default is 1, range [1, 64]
detection: detection method, the peak或 rms，default is rms
link: attenuation method, the average或maximum，default isaverage

8 、 alimiter

Limiter, used to prevent the input signal from exceeding the set threshold. Using forward prediction to avoid signal distortion means a little bit of delay in signal processing. The parameter options are as follows:

level_in: input gain, default is 1
level_out: output gain, default is 1
limit: limit the signal not to exceed the threshold, the default is 1
attack: signal amplification time, the default is 5ms
release: signal decay time, the default is 50ms
asc: When the gain needs to be reduced, the ASC is responsible for reducing to the average level
asc_level: decay time level, 0 means no extra time, 1 means extra time
level: automatically adjust the output signal, the default is off

The code of the limiter is located in af_alimiter.c, and the core code is as follows:

static int filter_frame(AVFilterLink *inlink, AVFrame *in)
{
    ......
    // 循环检测每个sample
    for (n = 0; n < in->nb_samples; n++) {
        double peak = 0;
        
        for (c = 0; c < channels; c++) {
            double sample = src[c] * level_in;

            buffer[s->pos + c] = sample;
            peak = FFMAX(peak, fabs(sample));
        }

        if (s->auto_release && peak > limit) {
            s->asc += peak;
            s->asc_c++;
        }

        if (peak > limit) {
            double patt = FFMIN(limit / peak, 1.);
            double rdelta = get_rdelta(s, release, inlink->sample_rate,
                                       peak, limit, patt, 0);
            double delta = (limit / peak - s->att) / buffer_size * channels;
            int found = 0;

            if (delta < s->delta) {
                s->delta = delta;
                nextpos[0] = s->pos;
                nextpos[1] = -1;
                nextdelta[0] = rdelta;
                s->nextlen = 1;
                s->nextiter= 0;
            } else {
                for (i = s->nextiter; i < s->nextiter + s->nextlen; i++) {
                    int j = i % buffer_size;
                    double ppeak, pdelta;

                    ppeak = fabs(buffer[nextpos[j]]) > fabs(buffer[nextpos[j] + 1]) ?
                            fabs(buffer[nextpos[j]]) : fabs(buffer[nextpos[j] + 1]);
                    pdelta = (limit / peak - limit / ppeak) / (((buffer_size - nextpos[j] + s->pos) % buffer_size) / channels);
                    if (pdelta < nextdelta[j]) {
                        nextdelta[j] = pdelta;
                        found = 1;
                        break;
                    }
                }
                if (found) {
                    s->nextlen = i - s->nextiter + 1;
                    nextpos[(s->nextiter + s->nextlen) % buffer_size] = s->pos;
                    nextdelta[(s->nextiter + s->nextlen) % buffer_size] = rdelta;
                    nextpos[(s->nextiter + s->nextlen + 1) % buffer_size] = -1;
                    s->nextlen++;
                }
            }
        }

        buf = &s->buffer[(s->pos + channels) % buffer_size];
        peak = 0;
        for (c = 0; c < channels; c++) {
            double sample = buf[c];

            peak = FFMAX(peak, fabs(sample));
        }

        if (s->pos == s->asc_pos && !s->asc_changed)
            s->asc_pos = -1;

        if (s->auto_release && s->asc_pos == -1 && peak > limit) {
            s->asc -= peak;
            s->asc_c--;
        }

        s->att += s->delta;

        for (c = 0; c < channels; c++)
            dst[c] = buf[c] * s->att;

        if ((s->pos + channels) % buffer_size == nextpos[s->nextiter]) {
            if (s->auto_release) {
                s->delta = get_rdelta(s, release, inlink->sample_rate,
                                      peak, limit, s->att, 1);
                if (s->nextlen > 1) {
                    int pnextpos = nextpos[(s->nextiter + 1) % buffer_size];
                    double ppeak = fabs(buffer[pnextpos]) > fabs(buffer[pnextpos + 1]) ?
                                                            fabs(buffer[pnextpos]) :
                                                            fabs(buffer[pnextpos + 1]);
                    double pdelta = (limit / ppeak - s->att) /
                                    (((buffer_size + pnextpos -
                                    ((s->pos + channels) % buffer_size)) %
                                    buffer_size) / channels);
                    if (pdelta < s->delta)
                        s->delta = pdelta;
                }
            } else {
                s->delta = nextdelta[s->nextiter];
                s->att = limit / peak;
            }

            s->nextlen -= 1;
            nextpos[s->nextiter] = -1;
            s->nextiter = (s->nextiter + 1) % buffer_size;
        }

        if (s->att > 1.) {
            s->att = 1.;
            s->delta = 0.;
            s->nextiter = 0;
            s->nextlen = 0;
            nextpos[0] = -1;
        }

        if (s->att <= 0.) {
            s->att = 0.0000000000001;
            s->delta = (1.0 - s->att) / (inlink->sample_rate * release);
        }

        if (s->att != 1. && (1. - s->att) < 0.0000000000001)
            s->att = 1.;

        if (s->delta != 0. && fabs(s->delta) < 0.00000000000001)
            s->delta = 0.;

        for (c = 0; c < channels; c++)
            dst[c] = av_clipd(dst[c], -limit, limit) * level * level_out;

        s->pos = (s->pos + channels) % buffer_size;
        src += channels;
        dst += channels;
    }

    if (in != out)
        av_frame_free(&in);

    return ff_filter_frame(outlink, out);
}