Guide: With the development of mobile Internet, the wave of video is coming with the east wind of 5G. However, many users are dissuaded by the complicated functions and steep learning curve of traditional video editors in the process of video creation. To this end, the Baijiahao business research and development team of Baidu combined with the actual creative needs of users to develop a simple and easy-to-use online video editing and publishing tool-Baijiahao Online Video Editor. This article will introduce the technical principles, architecture and evolution direction of this editor in detail, and reveal Baidu's internal technical cooperation and innovation mechanism from a corner.

Preface

With the rapid development of the mobile Internet, people are becoming more and more accustomed to watching video content on mobile phones. As a content production platform of Shoubai, Baijiahao needs to provide authors with easy-to-use video editing and publishing tools. Online video editors came into being under this demand. This content will introduce in detail the technology used by Baijiahao Video Editor.

Glossary

BOS: Baidu Object Storage (Baidu Object Storage) provides stable, safe, efficient and highly scalable storage services

VOD: Video-on-demand service, this article refers specifically to Baidu VideoWorks (formerly VOD audio-video-on-demand service)

1. What functions must an online video editor implement?

1.1 Basic functions of the editor

We investigated the local video editor and listed some functions that the video editor must implement:

Material source file management, loading and editing
Multitrack editor
Drag operation (add/delete material, add/delete effect, fast editing, switch track, etc.)
Audio and video track separation
Material effects (relief, nostalgia, etc.), transition animation (fade in and out, spiral, etc.), material animation (single point zoom, simulated shaking, etc.)
Subtitle editing and embedding
Video preview
Render and export in multiple formats

1.2 Unique features of the online editor

An online video editor must also implement the above functions, but the specific implementation is different, for example:

Material management: upload and delete material source files
Video preview: simple preview implemented by front-end js
Export: The online video editor is mainly for Baijiahao publishers, so it does not export video files, but accesses the video publishing process

In addition, relying on the technology systems of Baidu and Baijiahao, additional functions such as audio conversion to subtitles, subtitle synthesis audio, and Baijiahao graphic content to video can also be realized.

2. How to implement an online video editor?

2.1 Back-end technology selection

FFmpeg is the most commonly used video codec integrated framework in the industry. It is not only powerful, but also very efficient. Therefore, the back-end service uses FFmpeg as the bottom layer of the video codec.

2.2 Introduction to FFmpeg

FFmpeg is a free software that can run recording, conversion, and streaming functions in multiple formats of audio and video. Contains libavcodec-a decoder library for audio and video in multiple projects, and libavformat-an audio and video format conversion library. △Figure 1 ffmpeg

2.2.1 FFmpeg features

Free software, open source code;
Comes with many filters (plug-ins), which can meet all business needs at this stage;
Support third-party filters (plug-ins) to meet future business needs;
Support custom compilation and dynamic compilation to reduce memory usage as much as possible;
Support remote files (http, ftp, etc.) as input to reduce local disk occupation;
Support GPU encoding and decoding, reduce CPU usage, and increase encoding and decoding speed (we did not use GPU clusters in this business);
The syntax is simple, which is convenient for secondary packaging or assembly.

2.2.2 Command line usage

△Figure 2 ffmpeg command line usage

例1: ffmpeg -i in.wmv -vcodec libxvid out.mp4例2: ffmpeg -framerate 1 -t 1 -loop 1 -i "http://pic.rmb.bdstatic.com/2b18b480a1f2d15e3667e01c45dfc157.jpeg" -vcodec libx264 -pix_fmt yuv420p -y test.mp4

2.2.3 Basic rules of FFmpeg filter

Filter (avfilter) in FFmpeg is usually translated into filter/filter, and the role of filter is to filter (Filtering)). Any editing operations on decoded multimedia resources can be called Filtering in a broad sense, and the components and plug-ins that perform these operations are filters. For example, audio up-down/speed, video frame insertion/frame extraction, cropping/cutting/merge/overlay, etc.

△Figure 3 FFmpeg transcoding and Filter process

2.2.4 Basic filter and its schematic diagram

The basic filter is very simple to use, just use -vf to add the required filter between the input file (and options) and the output file (and options). E.g:

Scale ( static )

ffmpeg -i video_1080p.mp4 -vf scale=w=640:h=360 video_360p.mp4

△Figure 4 Schematic diagram of scale

Zoom and pan zoompan (dynamic)

ffmpeg -framerate 1 -t 1 -loop 1-i "http://pic.rmb.bdstatic.com/2b18b480a1f2d15e3667e01c45dfc157.jpeg"-vf"zoompan=z='if(eq(on,0),1,if(lt(zoom,1.25),zoom+0.0005,1.25))':d=16.06*25:x='if(lt(zoom,1.25),0,(x-1))':y='if(lt(zoom,1.25),0,(y+1))':s='1024x720'" -y tmp.mp4

△Figure 5 Schematic diagram of zoompan

Blur boxblur

ffmpeg -i tmp.mp4 -filter_complex "boxblur=luma_radius='min(h,w)/30':luma_power=2" -y boxblur.mp4  模糊虚化

△Figure 6 Schematic diagram of boxblur

Overlay

ffmpeg -i tmp.mp4 -i watermark.png -filter_complex "[1:v]scale=-2:48[logo];[0:v][logo]overlay=48:48" -y watermark.mp4  左上logo

△Figure 7 Schematic diagram of overlay

2.2.5 FFmpeg pipeline syntax

rule:

Use [name] to name the stream
Use, to separate between filters
Use between streams; to separate
The i-th input command is [i-1]
The video stream and audio stream of the first input file are [0:v] and [0:a]
The last stream name can be omitted

For example:

-filter_complex "[0:v]split[front][back]; //Copy and separate into two streams, front and back [back] //Background stream scale=1280:-2, //Scaling to the output width in equal proportion 1280 boxblur=luma_radius='min(h,w)/30':luma_power=2, //blur crop=iw:720[background]; //cut to 1280:720 [front]scale=-2:720[foreground ]; //Equally scaled to the output height 720 [background][foreground]overlay=(Ww)/2:(Hh)/2 //Overlay"

△Figure 8 Schematic diagram of pipelined filter flow

actual effect:

△Figure 9 Execution result of pipelined filter flow

2.3 Front-end technology selection

前端界面使用React框架实现，快速预览功能基于浏览器的html5音视频播放器实现，通过html标签传递调整参数给播放器，实现简单的负片、浮雕、黑白等播放效果，通过在视频上叠加动图的方式模拟转场效果。
受限于前端预览方案的性能和复杂度，前端快速预览只能展现部分编辑效果。

2.4 前后端功能边界及交互

2.4.1 前后端功能边界

进行具体的功能开发之前，需要根据需求和技术能力特点划分前后端功能边界，例如：
前端界面实现

用户与视频编辑器的交互
视频简单预览（受限于前后端技术栈差异和使用的资源差异，预览效果与最终结果可能不尽相同）
将用户在编辑界面操作的结果转换成时间轴数据结构
...

后端服务实现

时间轴转译成FFmpeg命令
视频产出后调用视频发布流程
...

需要前后端共同实现

字幕 <==> 音频
素材上传
...

根据我们的功能需求和前后端的功能划分，百家号在线视频编辑器的用户界面大致划分成3个区域：

黄线内的功能区
绿线内的多轨道编辑区
红线内的快速预览区

△图 10 百家号在线视频编辑器界面分区

2.4.2 时间轴数据结构

为了能在前后端之间进行交互，需要定义一种数据结构，这种数据结构要既便于前端多轨道编辑器的加载，修改和存储，又便于后端提取结构化数据。
我们定义了一种时间轴数据结构，时间轴中的轨道与多轨道编辑器中的轨道一一对应：

{"timeline":{"video_track": [ //视频轨道{"start": 0.0, //开始时间"end": 1.5, //结束时间 = start + duration * speed"type": "video", //可以是视频video, 图片image, 转场动画transition, 黑屏blank"height": 720,"width": 1280,"in_effect": "fade_in", //入场效果"out_effect": "fade_out", //退出效果"style": "negative", // 效果: 负片,模糊,浮雕,黑白 等等"duration": 1.5, //时长"speed": 1, //播放速度"animation": "zoompan",  //视频资源的动画效果, 如镜头晃动, 平移放大等"sourceUrl": "http://*.baidu.com/c20ad4d76fe97759aa27a0c99bff6710.mp4"}],"audio_track": [ //音频轨道{"start": 0.0, //开始时间"end": 1.5, //结束时间 = start + duration * speed"type": "video", //可以是视频video(视频音轨), 音频audio, 空白静音slience"in_effect": "fade_in", //入场效果"out_effect": "fade_out", //退出效果"style": "jazz", // 效果: 爵士, 摇滚, 人声 等等平衡器效果"duration": 1.5, //时长"speed": 1, //播放速度"sourceUrl": "http://*.baidu.com/c20ad4d76fe97759aa27a0c99bff6710.mp3","auto_subtitle": true, //语音转字幕}],"subtitle": [ //字幕轨道{"start": 0.0, //开始时间"end": 1.5, //结束时间 = start + duration * speed"type": "video", //可以是视频video(视频音轨), 音频audio, 空白静音slience"style": "Arial,23,yellow,white", // 效果: 字体,大小,颜色,描边颜色"duration": 1.5, //时长"text": "这是一条字幕","pos_x": 100, //字幕定位"pos_y": 200,  //字幕定位"tts": true, //使用字幕合成语音}],"watermark": [ //水印,特图{"start": 0.0, //开始时间"end": 1.5, //结束时间 = start + duration * speed"style": "transparent", //可以是透明transparent, 负片 等效果"style_params": "0.8", //效果的具体参数, 如透明度等"duration": 1.5, //时长"sourceUrl": "http://*.baidu.com/c20ad4d76fe97759aa27a0c99bff6710.png","pos_x": 100, //贴图定位"pos_y": 200,  //贴图定位"height": 100, //贴图高度"width": 100, //贴图宽度}]},"author_info":{}, //作者信息"extra":{}, //其他信息}

2.4.3 异步调用和轮询

当用户在完成编辑工作后，需点击"保存"按钮提交。这时前端会将多轨道编辑器内所有资源要素封装成时间轴结构传递给后端服务。后端服务接到时间轴结构后会进行转译，并调用FFmpeg进行视频编解码。
后端这一阶段的工作是计算密集型的，通常需要消耗时间轴长度2-5倍的时间来完成视频最终合成。因此点击"保存"按钮后，前端采用异步调用 + 定期轮询状态的方式检查后端视频合成是否完成。

2.5 后端时间轴转译流程

前面提到后端服务要对前端传递来的时间轴进行转译，转写成FFmpeg命令。
这一步的主要流程如下图所示：
△图 11 时间轴转译FFmpeg命令程图

3. 百家号在线视频编辑器的具体实现

3.1 百家号视频编辑器整体架构

△图 12 整体架构

3.2 用户界面和服务接口

目前视频编辑器提供了两种使用方法：面向最终用户的图形界面和面向开发者的服务接口。
其中图形界面集成在百家号内容创作后台，现已对部分百家号作者开放，而通过接口提供的音频转码，视频合并等服务也已经应用到了百家号线上服务当中。

3.3 业务层: 时间轴转译

在业务层中，为了隔离内外部网络请求，添加了UI层模块，用于处理来自于图形界面的视频编辑请求。Service模块是基于PHP开发的编辑器核心模块，主要作用是将图形界面和服务接口这两种类型的请求打平，将时间轴数据结构转译出能够直接执行的FFmpeg命令，并送给离线调度模块执行。
业务层Service模块在转译时主要完成了如下工作：

3.3.1 图片视频化

blur：传入视频/图片比例和尺寸可能与最终输出结果不一致，如手机竖屏拍摄的视频、网上下载的图片等等。之前业内对于不同比例的视频，要么留黑边，要么局部裁剪。随着手机短视频的兴起，现在流行的做法是如图13所示,，用模糊放大的背景图代替黑边。
zoompan：对于传入的静态图片，通常要将图片运动起来，使画面不至于太死板，获得更好的展现效果。

3.3.2 视频连接及转场

concat：将传入的个图片/视频流进行合并，连接成一条更长的视频轨。
overlay：在视频和视频相连接的时刻，添加一层转场动画，避免生硬的直接画面切换。

△图 13 overlay添加过场动画

3.3.3 音频

将传入的多段视频伴音/配音/TTS朗读接合成一条长音轨。
根据用户选择添加BGM，使视频更有氛围。
处理淡入淡出，避免生硬切换。

3.3.4 字幕

添加ass特效字幕头。
根据时间轴中的文本，生成ass字幕文件。
将ass字幕文件压制到视频流中。

△图 14 特效字幕头

3.3.5 组装

将所有滤镜命令用管道式滤镜流方式组合，生成滤镜流脚本。
将滤镜流脚本与生成的ass字幕同时分别上传到BOS上，便于后续FFmpeg命令直接读取和执行。

3.3.6 其他

需要在空白位置添加特定长度的空白视频/音频，保证产出视频的时间轴与视频编辑器界面的时间轴一致。
对较长的文本，需要精细拆分，以保证每段字幕都与TTS朗读同步（这一步骤在UI层进行计算）。

3.4 内部服务

在业务层中，涉及到用户信息、物料信息、语音合成等各种查询和调用，这些功能均由百家号和百度内部服务提供。

3.5 离线调度

Dispatch是一个分布式的任务调度系统，负责在多个实例（或容器内）均衡地执行FFmpeg请求，将生成资源上传BOS/VOD，回调Service层模块返回任务调度的执行结果。
FFmpeg是一套开源的、完善的音视频流转编码自由软件，负责最终执行FFmpeg命令，生成音视频文件。

4. 离线调度框架：实现分布式FFmpeg调度

4.1 Dispatch架构图

△图 15 Dispatch架构

4.2 Dispatch实现原理

实例启动时，Redis Hash数据结构注册自己，member=ip，value = 当前队列长度：当前状态：更新时间戳；
任何一个接收到Service层模块的请求后，如果自己当前队列长度为0，直接本地执行，否则将请求转发给队列最短的正常实例；
转发请求前先要从Redis获取所有Dispatch数据，解析所有实例的 ip、队列长度、状态、更新时间戳，根据规则选择一个最佳实例转发请求；
消费队列中的请求时，调用FFmpeg从BOS上获取输入文件，管道化滤镜流脚本，ass字幕文件, 然后执行道化滤镜流脚本，在本地磁盘生成产出文件，并上传BOS/VOD；
根据请求参数，回调Service层模块接口，更新任务状态。

5. 图文转视频技术项目：依托于视频编辑器后端服务的技术性尝试

5.1 以场景为单元编辑视频

相比视频编辑器，图文转视频项目的用户界面取消了时间轴，转而采用"场景（Scene）"这一概念。即一张图+一段话便是一个场景，视频就是由场景串接起来的。
△图 16 以场景为单元创建视频（设计稿）

5.2 文章落地页URL转视频

得益于场景这一简单概念，可以将落地页URL简单地转成场景，从而让图文/图集作者可以一键开始视频内容的编辑和创作。
图17展示了这一创作过程的流程图。
△图17 URL转视频流程
当转成时间轴之后，即可调用视频编辑器的接口，生成和发布视频。

5.3 图文转视频Demo

文末会附上几个图文转视频项目在技术验证时生成的视频，以展现实际效果。

6. 总结与展望

6.1 组合创新，适应潮流

百家号的在线视频编辑器技术可以简单总结为：后端使用PHP将前端JS生成的时间轴格式数据转译成FFmpeg命令，并通过Dispatch调度调度框架来执行FFmpeg产出最终视频。从这一层面看，这一技术没有高深的技术门槛，没有复杂抽象的逻辑模型，我们的技术创新，主要是组合现有技术，形成一项适应潮流的新的技术方案。
伴随着视频化浪潮到来的，不仅是普通用户对视频内容的大量需求，还有创作者对视频编辑工具便利性的热切期待。百家号一直站创作者的角度，为创作者们提供更加优秀的视频编辑器。希望通过我们的努力，给视频化浪潮里的创作者们带去得力的船桨。

6.2 技术共享，合作共赢

在百家号在线视频编辑器技术发展过程中，吸引了来自百度ACG的媒体云团队的关注，两个团队在在线视频编辑器技术上进行了深入的技术交流。

随后，百度媒体云基于这一技术开发出了智能视频快编服务。得益于媒体云的长期技术积累和对视频编辑底层技术的深入挖掘，智能视频快编服务使用智能分片+GPU编解码技术，将视频编辑合成的效率提升了数倍，同时也提供了更多新特性和新功能，使在线视频编辑技术更加实用化。
目前，百家号正在将视频编辑器及通用视频编辑能力的底层服务逐步迁移到媒体云的智能视频快编服务。百家号团队作为在线视频编辑器技术的输出方，已经开始享受技术合作带来的红利。

Original link: https://mp.weixin.qq.com/s/wHrQS9DXEKcszpiILt9Gmg

---------- END ----------

Baidu Architect

Baidu's official technical public account is online now!

Technical Dry Goods · Industry Information · Online Salon · Industry Conference

Recruitment information · Introducing information · Technical books · Baidu surroundings

Welcome everyone to pay attention!

Technological Evolution of Baijiahao Online Video Editor