FFmpeg source code analysis: introduction to video filters (below)

FFmpeg provides audio and video filters in the libavfilter module. All video filters are registered in libavfilter/allfilters.c. We can also use the ffmpeg -filters command line to view all currently supported filters, the preceding -v stands for video. This article mainly introduces video filters, including: drawing text, edge detection, fading, Gaussian blur, left and right mirroring, layer overlay, video rotation.

For a detailed introduction to video filters, you can check the official documentation: Video Filters . For the first half of the introduction to video filters, you can view the previous article: Introduction to Video Filters (Part 1) .

1、drawtext

Draw text, draw text text on the video screen. You need to open the freetype third-party library --enable-freetype. For details, see: freetype official website . If you want to set the font size and color, you need to open the fontconfig third-party library --enable-fontconfig. For details, please check: fontconfig official website . If you want to set the font shape, you need to open the libfribidi third-party library --enable-libfribidi. For details, please check: fribidi's GitHub website .

The parameter options are as follows:

box: whether to use the background color to draw a rectangular box, 1 (on) or 0 (off), the default is off
boxborderw: The width of the drawn rectangle border, the default is 0
boxcolor: the color of the rectangle border, the default is white
line_spacing: line spacing, default is 0
basetime: start timing, in microseconds
fontcolor: font color, default is black
font: font, default is Sans
fontfile: font file, the absolute path of the file is required
alpha: The alpha value of the text's blending transparent channel, the range is [0.0, 1.0], the default is 1.0
fontsize: font size, default is 16
shadowcolor: shadow color, default is black
shadowx, shadowy: the x, y offset of the text shadow relative to the text
timecode: time code, the default format is "hh:mm:ss[:;.]ff"
text: string text, must be in UTF-8 encoding format
textfile: text file, must be in UTF-8 encoding format
main_h, h, H: input height
main_w, w, W: input width
n: which frame to start drawing the text from, default is 0
t: Timestamp expression, in seconds, NAN is an unknown value
text_h, th: text height
text_w, tw: text width
x, y: The xy coordinate point of the text on the video screen, as the starting position of the rendered text

Draw text reference command, specify text, xy coordinate point, font size, font color:

ffmpeg -i in.mp4 -vf drawtext="text='Hello world:x=10:y=20:fontcolor=red" watermark.mp4

The effect of adding text watermark is shown in the following figure:

2、edgedetect

Edge detection, for detecting and drawing edges, using the Canny edge detection algorithm. For details on the principle of Canny edge detection algorithm, please refer to: Wikipedia of Canny edge detection . The edge detection steps are as follows:

(1) Apply a Gaussian filter to smooth the image to remove noise
(2) Calculate the gradient and direction of the image
(3) Apply gradient magnitude threshold or lower cutoff suppression to eliminate spurious responses to edge detection
(4) Apply dual thresholds to determine potential edges
(5) Lag Tracking Edge: Suppresses all other edges that are weak and not connected to strong edges

The parameter options are as follows:

low, high: Canny edge detection threshold, range [0,1], the default minimum value is 20/255, the default maximum value is50/255
mode: drawing mode, the default is wire, all modes are as follows:
'wires': draw white-gray wires on a black background
'colormix': mix colors, similar to painting cartoon effect
'canny': Canny detection for each plane
planes: whether to enable plane filtering, enabled by default

2.1 Edge Detection Algorithm

The code for edge detection is located in libavfilter/vf_edgedetect.c. The operation steps include Gaussian filtering, sobel operator, elimination of response, double threshold to find potential edges, and lag tracking edges. The core code is as follows:

static int filter_frame(AVFilterLink *inlink, AVFrame *in)
{
    ......
    // 从缓冲区获取视频帧数据
    out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
    
    for (p = 0; p < edgedetect->nb_planes; p++) {
        ......
        // 高斯滤波，图像降噪处理
        gaussian_blur(ctx, width, height,
                      tmpbuf,      width,
                      in->data[p], in->linesize[p]);

        // sobel算子: 计算图像梯度和方向
        sobel(width, height,
              gradients, width,
              directions,width,
              tmpbuf,    width);

        memset(tmpbuf, 0, width * height);
		// 应用梯度阈值消除虚假响应
        non_maximum_suppression(width, height,
                                tmpbuf,    width,
                                directions,width,
                                gradients, width);

        // 应用高低位双阈值来确定潜在边缘
        double_threshold(edgedetect->low_u8, edgedetect->high_u8,
                         width, height,
                         out->data[p], out->linesize[p],
                         tmpbuf,       width);
        // 颜色混合，滞后跟踪边缘
        if (edgedetect->mode == MODE_COLORMIX) {
            color_mix(width, height,
                      out->data[p], out->linesize[p],
                      in->data[p], in->linesize[p]);
        }
    }

    if (!direct)
        av_frame_free(&in);
    return ff_filter_frame(outlink, out);
}

2.2 Gaussian filter

Since all edge detection results are susceptible to image noise, it is necessary to remove noise to avoid false detections. To smooth the image, a Gaussian filter kernel is used to convolve the image. The 5x5 Gaussian filter code is as follows:

static void gaussian_blur(AVFilterContext *ctx, int w, int h,
                                uint8_t *dst, int dst_linesize,
                          const uint8_t *src, int src_linesize)
{
    int i, j;
    memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
    if (h > 1) {
        memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
    }
    for (j = 2; j < h - 2; j++) {
        dst[0] = src[0];
        if (w > 1)
            dst[1] = src[1];
        for (i = 2; i < w - 2; i++) {
            /* Gaussian mask of size 5x5 with sigma = 1.4 */
            dst[i] = ((src[-2*src_linesize + i-2] + src[2*src_linesize + i-2]) * 2
                    + (src[-2*src_linesize + i-1] + src[2*src_linesize + i-1]) * 4
                    + (src[-2*src_linesize + i  ] + src[2*src_linesize + i  ]) * 5
                    + (src[-2*src_linesize + i+1] + src[2*src_linesize + i+1]) * 4
                    + (src[-2*src_linesize + i+2] + src[2*src_linesize + i+2]) * 2

                    + (src[  -src_linesize + i-2] + src[  src_linesize + i-2]) *  4
                    + (src[  -src_linesize + i-1] + src[  src_linesize + i-1]) *  9
                    + (src[  -src_linesize + i  ] + src[  src_linesize + i  ]) * 12
                    + (src[  -src_linesize + i+1] + src[  src_linesize + i+1]) *  9
                    + (src[  -src_linesize + i+2] + src[  src_linesize + i+2]) *  4

                    + src[i-2] *  5
                    + src[i-1] * 12
                    + src[i  ] * 15
                    + src[i+1] * 12
                    + src[i+2] *  5) / 159;
        }
        if (w > 2)
            dst[i] = src[i];
        if (w > 3)
            dst[i + 1] = src[i + 1];

        dst += dst_linesize;
        src += src_linesize;
    }
    if (h > 2) {
        memcpy(dst, src, w); dst += dst_linesize; src += src_linesize;
    }
    if (h > 3)
        memcpy(dst, src, w);
}

2.3 sobel operator

Edges in an image may point in multiple directions, so the Canny algorithm uses four filters to detect horizontal, vertical, and diagonal edges in blurry images. The sobel operator is used here to determine the gradient and direction of the image. The code is as follows:

static void sobel(int w, int h,
                       uint16_t *dst, int dst_linesize,
                         int8_t *dir, int dir_linesize,
                  const uint8_t *src, int src_linesize)
{
    int i, j;

    for (j = 1; j < h - 1; j++) {
        dst += dst_linesize;
        dir += dir_linesize;
        src += src_linesize;
        for (i = 1; i < w - 1; i++) {
            const int gx =
                -1*src[-src_linesize + i-1] + 1*src[-src_linesize + i+1]
                -2*src[                i-1] + 2*src[                i+1]
                -1*src[ src_linesize + i-1] + 1*src[ src_linesize + i+1];
            const int gy =
                -1*src[-src_linesize + i-1] + 1*src[ src_linesize + i-1]
                -2*src[-src_linesize + i  ] + 2*src[ src_linesize + i  ]
                -1*src[-src_linesize + i+1] + 1*src[ src_linesize + i+1];

            dst[i] = FFABS(gx) + FFABS(gy);
            dir[i] = get_rounded_direction(gx, gy);
        }
    }
}

The edge detection effect is shown in the following figure:

3、fade

Fade effect, apply the fade effect to the video. The parameter options are as follows:

type, t: effect type, "in" represents fade-in effect, "out" represents fade-out effect, the default is fade-in effect
start_frame, s: the frame number at which the effect starts, the default is 0
nb_frames, n: The number of frames the effect lasts, the default is 25
alpha: whether to enable alpha, if enabled, only the effect will be applied to the alpha channel, and it is disabled by default
start_time, st: the time when the effect starts, starting from 0 by default
duration, d: the duration of the effect
color, c: fade effect color, default is black

Reference commands in frames:

fade=t=in:s=0:n=30

Reference commands in time units:

fade=t=in:st=0:d=5.0

4 、 gblur

Gaussian blur, use Gaussian blur to play mosaic. Regarding the Gaussian blur algorithm, the main idea is to perform a weighted average of the adjacent regions of the pixels. For details, please refer to: Wikipedia of Gaussian Blur Algorithm . The parameter options are as follows:

sigma: horizontal direction sigma, Gaussian blur standard deviation, the default is 0.5
steps: the number of steps for the Gaussian approximation, defaults to 1
planes: select which plane to filter, default all planes
sigmaV: sigma in the vertical direction, if it is -1, it is the same as the horizontal direction, the default is -1

4.1 Gaussian Blur Algorithm

The Gaussian blur code is located in vf_gblur.c, and the key code is as follows:

static void gaussianiir2d(AVFilterContext *ctx, int plane)
{
    GBlurContext *s = ctx->priv;
    const int width = s->planewidth[plane];
    const int height = s->planeheight[plane];
    const int nb_threads = ff_filter_get_nb_threads(ctx);
    ThreadData td;

    if (s->sigma <= 0 || s->steps < 0)
        return;

    td.width = width;
    td.height = height;
    // 水平方向滤波
    ctx->internal->execute(ctx, filter_horizontally, &td, NULL, FFMIN(height, nb_threads));
    // 垂直方向滤波
    ctx->internal->execute(ctx, filter_vertically, &td, NULL, FFMIN(width, nb_threads));
    // 缩放后处理
    ctx->internal->execute(ctx, filter_postscale, &td, NULL, FFMIN(width * height, nb_threads));
}

4.2 Horizontal filtering

First, look at the horizontal filter filter_horizontally() function:

static int filter_horizontally(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
{
    ......
    // 对每个slice条带进行滤波
    s->horiz_slice(buffer + width * slice_start, width, slice_end - slice_start,
                   steps, nu, boundaryscale);
    emms_c();
    return 0;
}

And horiz_slice is a function pointer (assigned at initialization), pointing to the horiz_slice_c() function:

static void horiz_slice_c(float *buffer, int width, int height, int steps,
                          float nu, float bscale)
{
    int step, x, y;
    float *ptr;
    for (y = 0; y < height; y++) {
        for (step = 0; step < steps; step++) {
            ptr = buffer + width * y;
            ptr[0] *= bscale;

            // 向右滤波
            for (x = 1; x < width; x++)
                ptr[x] += nu * ptr[x - 1];
            ptr[x = width - 1] *= bscale;

            // 向左滤波
            for (; x > 0; x--)
                ptr[x - 1] += nu * ptr[x];
        }
    }
}

4.3 Vertical Filtering

For vertical filtering, the filter_vertically() function is as follows:

static int filter_vertically(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs)
{
    ......
    // 沿着垂直方向逐个滤波(每步处理8列)
    do_vertical_columns(buffer, width, height, slice_start, aligned_end,
                        steps, nu, boundaryscale, 8);

    // 对于非对齐的列逐个滤波
    do_vertical_columns(buffer, width, height, aligned_end, slice_end,
                        steps, nu, boundaryscale, 1);
    return 0;
}

The do_vertical_columns() function called internally is as follows:

static void do_vertical_columns(float *buffer, int width, int height,
                                int column_begin, int column_end, int steps,
                                float nu, float boundaryscale, int column_step)
{
    const int numpixels = width * height;
    int i, x, k, step;
    float *ptr;
    for (x = column_begin; x < column_end;) {
        for (step = 0; step < steps; step++) {
            ptr = buffer + x;
            for (k = 0; k < column_step; k++) {
                ptr[k] *= boundaryscale;
            }
            // 向下滤波
            for (i = width; i < numpixels; i += width) {
                for (k = 0; k < column_step; k++) {
                    ptr[i + k] += nu * ptr[i - width + k];
                }
            }
            i = numpixels - width;

            for (k = 0; k < column_step; k++)
                ptr[i + k] *= boundaryscale;

            // 向上滤波
            for (; i > 0; i -= width) {
                for (k = 0; k < column_step; k++)
                    ptr[i - width + k] += nu * ptr[i + k];
            }
        }
        x += column_step;
    }
}

4.4 Post-scaling processing

postscale_slice is also a function pointer, pointing to the postscale_c() function, which mainly performs two steps: scaling and cropping:

static void postscale_c(float *buffer, int length,
                        float postscale, float min, float max)
{
    for (int i = 0; i < length; i++) {
        // 缩放
        buffer[i] *= postscale;
        // 裁剪
        buffer[i] = av_clipf(buffer[i], min, max);
    }
}

5、hflip

Flip Horizontal, the video is flipped horizontally. The counterpart is vflip, which flips vertically.

The command to flip horizontally is as follows:

ffmpeg -i in.mp4 -vf "hflip" out.mp4

Horizontal flip is also called left and right mirroring.

6、hstack

Horizontal splicing, two videos are spliced up and down in the horizontal direction. Corresponding to it is vstack, left and right splicing.

The command for horizontal splicing is as follows:

ffmpeg -i one.mp4 -i two.mp4 -vf "hstack" out.mp4

7、rotate

Rotate, rotate the video at any angle expressed in radians, either clockwise or counterclockwise. The parameter options are as follows:

angle, a: use radian angle to indicate the angle to be rotated, the default is 0, if it is negative, it means counterclockwise rotation
out_w, ow: the width of the output video, the default is the same as the input video, ie "iw"
out_h, oh: the height of the output video, the default is the same as the input video, ie "ih"
bilinear: bilinear interpolation, enabled by default, 0 means closed, 1 means open
fillcolor, c: fill color, default is black
n: the serial number of the input video frame
t: The time of the input video frame, in seconds
hsub, vsub: subsampling in the horizontal and vertical directions, for example, the pixel format is "yuv422p", then hsub=2, vsub=1
in_w, iw, in_h, ih: the width and height of the input video
out_w, ow, out_h, oh: the width and height of the output video
rotw(a), roth(a): Minimum width and height of rotated video

Taking a 90° clockwise rotation as an example, the effect before and after the rotation is shown in the following figure:

8、xfade

Transition animation, applied to transitions from one video to another. It should be noted that the frame rate, pixel format, resolution, and time base of all input videos must be consistent.

The supported transition animations include fading in and out, erasing up and down, left and right, up and down, left and right, circular clipping, rectangle clipping, circular opening, dissolve, blur, zoom, etc. The default is fade in and out. As shown in the following list:

‘custom’
‘fade’
‘wipeleft’
‘wiperight’
‘wipeup’
‘wipedown’
‘slideleft’
‘slideright’
'slideup'
‘slidedown’
‘circlecrop’
‘rectcrop’
‘distance’
‘fadeblack’
‘fadewhite’
‘radial’
‘smoothleft’
‘smoothright’
‘smoothup’
‘smoothdown’
‘circleopen’
‘circleclose’
'disguise'
‘vertclose’
'horzopes'
‘horzclose’
‘dissolve’
pixelated
'diagtl'
'diagtr'
'diagbl'
'diagbr'
‘hlslice’
'hrslice'
'vuslice'
'vdslice'
'hblur'
‘fadegrays’
'wipetl'
‘wipetr’
‘wipebl’
‘wipebr’
‘squeezeh’
‘squeezev’
‘zoomin’

The parameter options are as follows:

duration: the duration of the transition animation, the range is [0, 60], the default is 1
offset: The offset time of the transition animation relative to the first video, in seconds, the default is 0

9、overlay

Video overlay, superimpose another layer on the video, you can do text watermark, picture watermark, GIF watermark, etc. The parameter options are as follows:

x, y: Set the xy coordinate point of the overlay layer
format: The pixel format of the output video, the default is yuv420, the complete list is as follows:
'yuv420'
'yuv420p10'
‘yuv422’
'yuv422p10'
'yuv444'
‘rgb’
'gbrp' (flat RGB)
'auto' (automatic selection)
alpha: set the transparency format, straight or premultiplied, the default is straight
main_w, W, main_h, H: the width and height of the input video
overlay_w, w: overlay_h, h: the width and height of the overlay layer
n: the offset number of the video frame, the default is 0
pos: the position of the input frame in the file
t: timestamp

The command to add image watermark is as follows:

ffmpeg -i in.mp4 -i logo.png -filter_complex overlay=10:20 out.mp4

If you want to configure the orientation of the upper left corner, upper right corner, lower left corner and lower right corner, you can use the following methods:

    private static String obtainOverlay(int offsetX, int offsetY, int location) {
        switch (location) {
            case 2: // 右上角
                return "overlay='(main_w-overlay_w)-" + offsetX + ":" + offsetY + "'";
            case 3: // 左下角
                return "overlay='" + offsetX + ":(main_h-overlay_h)-" + offsetY + "'";
            case 4: // 右下角
                return "overlay='(main_w-overlay_w)-" + offsetX + ":(main_h-overlay_h)-" + offsetY + "'";
            case 1: // 左上角
            default:
                return "overlay=" + offsetX + ":" + offsetY;
        }
    }

The effect of adding a picture watermark is as follows (use the logo made by Thor to pay tribute to and cherish Thor):

The reference command for adding a GIF animation watermark is as follows (-ignore_loop 0 means to loop the GIF):

ffmpeg -i in.mp4 -ignore_loop 0 -i in.gif -filter_complex overlay=10:20 out.mp4

Partners interested in audio and video can learn from GitHub: https://github.com/xufuji456/FFmpegAndroid