How to structure a billion-level short video?

say up front

In Nien's (50+) reader community, he often guides everyone in the interview structure and gets high-end offers.

A few days ago, I guided a small partner with an annual salary of 1 million and got an interview invitation from Byte.

I encountered a very, very high-frequency interview question, but it was very difficult to answer, similar to the following:

  • Short video system, how to do system architecture?
  • Short video APP, how to do system architecture?

Recently, there was a NetEase Ermian, and I encountered this problem again.

In fact, Nin has always wanted to sort out a textbook answer,

Here is a new industry case "Practice of High Availability Architecture of ByteDance Billion-Level Video Processing System". Nien reconstructed and sorted out this solution from the perspective of interview. " Nin Java Interview Collection PDF " V97 version

The following content is Nien's secondary analysis based on his own 3-level architecture notes and Nien's 3-level architecture knowledge system (3-level architecture universe).

PDF of "Nin Architecture Notes", "Nin High Concurrency Trilogy" and " Nin Java Interview Collection ", please go to the official account [Technical Freedom Circle] to get it

Article directory

Macro business architecture of short video systems (such as TikTok, Instagram Reel, YouTube Shorts)

The application of streaming media technology represented by short video on demand has achieved rapid expansion in the era of mobile Internet.

Now, short video content has become the new trend and everyone is consuming it from platforms like TikTok, Instagram, YouTube, etc. Let's see how to create a system for TikTok.

Today, with the diversification of Internet content, short videos quickly replaced traditional text and pictures, swept people's vision and life, and became an important channel for information dissemination.

Such an app might look small, but there's a lot going on in the background.

Here are the associated challenges:

  • Since the application is used globally, there will be a large number of requests sent to the server. This ends up increasing the load on the server.
  • Uploading the video to the background will be a huge task, which will increase the load on the server and block it.
  • Play videos smoothly without buffering.
  • A recommender system that recommends videos based on user interests.

Let's understand each part one by one. I divide it into three parts:

  • User-Related Subsystems
  • Subsystems related to video publishing
  • Subsystems related to likes and comments
  • Recommendation Subsystem

1) Subsystems related to users

This is a service that contains user related services like this:

  • Registration: The user will be registered in the application.
  • Login: It will authenticate the credentials and send a response to the application.
  • Logout: The user will be logged out of the application.
  • Follow: If a user wants to follow or unfollow other users, it can be done through this service.

To store user-related data, we will use a SQL-based database such as MYSQL or PostgreSQL , which is an appropriate choice since user-related data (such as tracking followers) will be relational data.

To optimize database performance, we will use a master-slave architecture. The master database is used to perform write operations and the slave database is used to perform read operations. To learn more about this, you can read the article How to optimize database performance and scale it ? [3]

Now let's discuss the flow of user services. Applications will make API calls and API Gateway will manage those APIs.

It will route requests for user services.

Requests will go through a load balancer, and there will be multiple user service instances under the load balancer.

Depending on the load, it will decide which instance will handle the request.

Once the request is processed, the load balancer will send the response back to the API Gateway and then back to the application.

2) Subsystems related to video publishing

It generally includes video uploading, storage, processing, playback and other processes and corresponding process management and review.

The operation of the core is as follows:

  • Upload video: upload the video to the background server.
  • Posting: If a user wants to create, edit or delete a post, it can be done through this service.

Technical points of subsystems related to video publishing:

  • How to safely and reliably store PB-level massive data and realize fast access to video data;
  • How to support video upload in various scenarios;
  • How to ensure stable and smooth streaming playback;
  • And how to meet the basic processing requirements such as video transcoding and watermarking have become technical problems that need to be considered and solved when building a video-on-demand platform.

The core of the core is the storage of short videos.

To store data related to posts, we will use a NoSQL based database such as MiniO .

For each user, there may be thousands of posts, which will result in a large amount of data.

Scaling the database can be difficult for optimal performance. NoSQL databases support horizontal sharding, which helps us scale the database without affecting performance.

Now let's discuss the flow of video serving.

Applications will make API calls and API Gateway will manage those APIs. It will route requests for the video service.

Requests will go through a load balancer, and there will be multiple video service instances under the load balancer.

Depending on the load, it will decide which instance will handle the request. Once the request is processed, the load balancer will send the response back to the API Gateway and then back to the application.

How to make files globally accessible without increasing download time?

Video files will be uploaded to NOSQL such as MiniO.

Now, if we want to access a file anywhere in the world without any delay, then the file is sent to a Content Delivery Network (CDN) which will update the media file to different data cloud storages around the world.

Can we optimize further to reduce download time?

There is also a challenge that needs to be solved, that is, the size of the original video can be large, so if a large file is sent back to the client, the download time will be longer, which affects the user experience.

Once the file is uploaded to cloud storage, you can store the file path in the database.

Then send the post/video details to a message queuing system like Kafka or RockerMq .

To make the user experience smooth, we need to compress the video and create different resolutions for different devices.

The video processing worker will receive the video details from the message queuing system and then from the

Extract files from cloud storage and process them. After processing, these new video files are sent to the CDN .

How to access compressed video files?

Now you might be thinking, how does the application know the file path of the compressed video discussed above? Since compressed files will be stored in categorized folders, files can be easily found based on resolution and file name.

The video publishing API will just return the filename, while to access the file the application will add the resolution details in the URL itself, eg /media//mediaID/xxxx.

When this URL is accessed, it goes through the API Gateway and the resolution and filename details are extracted from the URL.

It will then check in the caching system (Redis), and if the file is not available, it will hit the CDN and fetch the file through it. It is then added to the cache so that if the same file is requested again it doesn't have to be fetched from the CDN .

3) Like and comment related subsystems

This is a service that includes services related to video likes and comments. As the name suggests, through this service we can update likes and comments for a particular post. Same as other processes discussed above.

4) Recommendation subsystem

Through this service, a series of posts are recommended based on user preferences. There are many other things going on behind the scenes. Let's take a look at the process running behind the scenes.

Then, when a post is created, it will be sent to the message queue system , and the consumer will fetch the data and update the data into **Big Data (Hadoop)**.

A separate server will be set up for machine learning services such as PyTorch and Tensorflow , where it will pull data from big data and train models.

A recommendation service will use this AI model to recommend posts for a given user.

Technology selection: common NOSQL storage framework selection

Current storage can be logically divided into three categories, namely block storage, file storage and object storage.

  • Block storage generally refers to common volume or hard disk storage, and corresponding storage methods such as disk arrays, NAS, and SAN. The operation object is disk, which is addressed by logical block number. Data is accessed by byte, and the read and write speed is fast.
  • File storage organizes data into different types of files in the structure required by different applications. Operations such as creating, searching, modifying, and deleting files can be performed to facilitate data sharing.
  • Object storage stores files in the form of objects (an object contains attributes and content). Usually, multiple distributed servers have built-in large-capacity hard disks, and clusters are formed through object storage software to provide external read and write access functions.

The industry's more mainstream open source storage frameworks MinIO, Ceph, and SeaweedFS are compared in terms of open source protocols, scalability, and cost, as shown in the following table:

Since object storage combines the advantages of fast read and write efficiency of block storage, scalable storage space, and convenient sharing of file storage, and combines the data storage and video-on-demand requirements of the short video platform, it is recommended to select the object storage framework as the storage logic of the short video on-demand platform.

Further considering the data scale of the short video-on-demand platform, dynamic storage expansion without downtime, online HTTP multimedia playback, and learning operation and maintenance costs, etc., through the above comparison, it is recommended to choose the MinIO open source framework as the basic framework for short video storage and on-demand.

Key introduction: MinIO object storage framework

The emergence of object storage is to solve the problem of storing massive big data, such as storing massive videos and pictures, and performing operations such as data archiving, data backup, and big data analysis.

Object storage generally adopts a key-object flat storage architecture, which is easy to use and can read and write data in a variety of ways by calling the API. Its large capacity, dynamic expansion, data disaster recovery and other performances are unmatched by traditional file storage and NAS.

MinIO is a lightweight, high-performance open source object storage framework based on the Apache License V2.0 protocol, suitable for massive unstructured data storage such as pictures, videos, and images.

MinIO is implemented in Golang. The client supports Java, Python, JavaScript, and Golang languages. It is compatible with the Amazon S3 cloud storage service interface, which is convenient for integration with other applications.

1) Storage mechanism

MinIO uses erasure code (erasure code) and checksum (checksum) to protect data from hardware failure and silent data corruption, which can guarantee normal data access in case of N/2 node damage.

2) Scalability

Simplicity and scalability are two important design concepts of MinIO clusters.

MinIO supports peer-to-peer expansion and federated expansion. Peer-to-peer expansion, that is, to expand the cluster by adding peer-to-peer cluster nodes and disks. For example, if the original cluster contains 4 nodes and 4 disks, then 4 nodes and 4 disks (or multiples thereof) can also be added during expansion. Ensure that the system can maintain the same data redundancy SLA, reducing the complexity of expansion.

The basic principle of federation expansion is to introduce etcd as a unified namespace, and to form a federation of multiple MinIO clusters. On the basis of the original federation, new clusters can be added to the federation. This expansion method can theoretically achieve unlimited expansion, and can realize Expand capacity without interrupting the original federation service.

3) External service

MinIO is fully compatible with the S3 standard interface, and the client and server communicate through http/https. MinIO provides the client mc (MinIO Client) to support UNIX commands, and also supports multi-language client SDKs. In addition, in addition to using disks, its storage backend can also connect to other storage systems and resources through gateways. The details are shown in the table below.

4) Multimedia streaming support

For multimedia files, MinIO supports HTTP-Range online streaming playback and audio and video progress bar dragging.

As shown in the figure below, when using a browser to access a multimedia file stored in MinIO in the form of a stream, each time the progress bar is dragged, an Http-Request request will be sent to the MinIO server, and the request header contains the Range field, and its content is the current The start bytes of the requested video progress and the end bytes of the cache.

This form enables MinIO to naturally support streaming playback and progress dragging of multimedia files.

Figure MinIO multimedia online playback support

Simple short video system based on MinIO

Due to factors such as the dynamic scalability of cluster storage, support for HTTP streaming, and operating costs,

It is recommended to use MinIO object storage as the underlying storage, develop and deploy short video on demand address mapping, address dynamic proxy and other services, and realize a set of short video storage on demand platform.

Its implementation framework is as follows:

Figure based on MinIO short video on demand platform architecture

The on-demand platform can be roughly divided into storage layer, service layer and application layer.

  • The storage layer mainly deploys MinIO object storage system and relational database, MinIO is used to store video objects, and relational database is used to store video metadata;
  • The service layer provides various storage access service interfaces, such as file upload and download, video playback address generation, object address mapping, etc.;
  • The application layer provides application functions for the front end, including functions such as video upload, query, and playback.

The data access process of the on-demand platform based on MinIO object storage is shown in the following figure:

Fig. Data access flow chart of short video on demand platform based on MinIO

1) Video upload and transcoding

The mp4 format is uniformly used as the video storage and on-demand format. In order to be compatible with video file uploads in various formats, a transcoding module needs to be developed to transcode it into mp4 format for storage, and store it in the local disk cache first.

2) Live recording

Start the recording during the live broadcast, and save the recorded files to the local disk cache first.

3) Upload files

After the video transcoding or recording is completed, call the MinIO file upload interface to upload the video file to the MinIO cluster/federation, and etcd will provide registration and discovery services for the MinIO cluster.

4) On-demand address mapping

The server deploys the on-demand address mapping service module to realize the mapping between the MinIO video-on-demand address and video ID, so that the change of the storage medium does not affect the video-on-demand streaming.

5) Address dynamic proxy service

For the sake of system security, we do not want to expose the MinIO storage address and storage details. We hope to add a layer of gateway to forward the media stream or proxy the address, provide a unified service address to the outside world, and use the address dynamic proxy, which can be based on the on-demand request. Different video IDs are dynamically proxied to different video playback addresses to realize the decoupling of video storage details and service addresses.

6) Pull stream to play

The client or browser uses the HTTP protocol to stream the video file and play it.

7) Summary

Select the MinIO open source storage framework to quickly design and build a video-on-demand platform that supports massive short video upload, storage, and on-demand functions.

It provides certain technical selection and design references for the emerging short video on demand platforms and related applications.

The core point of short video architecture: CDN cache

In addition to minio storage, short videos also have high requirements for CDN distribution.

Compared with traditional long videos, because long videos will perform prefetch and refresh operations, the files will be distributed to CDN nodes in advance.

However, because the short video content is UGC, and the page will be released and played immediately after the video is uploaded, it is often not possible to prefetch to each CDN node in advance and preheat it like a long video. Distribution capability is required.

Upload nearby

After the user shoots a video, it needs to be uploaded immediately.

CDN manufacturers generally have multiple data centers across the country. "From the perspective of basic resource capabilities, CDN networks are required to provide customers with the ability to upload nearby."

How to achieve?

Through a set of SDK, developers embed this SDK into their APP, and when the end user uploads the video, he will obtain the nearest or the best data center in the current network through HTTP DNS scheduling node, and realize the upload function of this file.

Practice of billion-level video processing system architecture

The ByteDance Volcano Engine video platform supports the full life cycle management of videos for multiple billion-level applications:

  • Related ToB business of Volcanic Engine video,
  • Supports ByteDance Douyin,
  • watermelon video and other products

Full video lifecycle:

  • video production
  • video release,
  • video playback, etc.

The overall life cycle of video processing

The overall life cycle of a video can be roughly divided into four stages:

  • End-to-side production : The creator of the video shoots a video with a mobile phone or other devices, and can make some enhancements and edits to the video. By uploading the SDK, the video can be uploaded to the cloud.
  • Cloud production : There are two core processes in the cloud: video processing and review, and these two processes are executed in parallel.
  • Cloud delivery : After the above two processes are completed, a video can be shown to everyone, and then enter the stage of cloud delivery.
    At this stage, the on-demand service is responsible for delivering the playback address of the video (including relevant meta information), and then the content of the video is delivered through the CDN.
  • Video playback : In this stage, the playback SDK processes and renders the video on the device.

In the video life cycle, the video processing system is the core link of cloud production.

Let's first take a look at some of the challenges that ByteDance faces in video processing.

  • Large-scale :
    At present, ByteDance processes hundreds of millions of videos every day, because each video will produce videos in different gears and formats, and the actual production is close to one billion videos. This consumes a lot of computing and storage, and such a large amount of business also has very high requirements on the overall stability and performance of the system.
  • Multi-service :
    Bytedance’s video services are very diverse, including short video, medium video, long video, and some services related to on-demand, live broadcast, and RTC, involving different vertical industries such as education and games.
  • Complex resources :
    In addition to conventional CPU resources, there are many elastic resources, as well as various types of resources such as CPU/GPU/FPGA, and some other hardware transcoding devices.
  • High-speed business growth and the peak of large-scale events :
    So far, the volume of videos processed has at least doubled every year.
    There are many large-scale activities every year, which bring a very huge test to the system.

Goals of Video Processing Systems

Faced with the above challenges, what goals should the video processing system achieve?

You can see the above picture, this picture is more logical relationship. The ultimate goal of the video processing system can be summed up in three points:

  1. Meet business needs.
  2. Improve user experience. Such as image quality, fluency and other aspects of the experience.
  3. cut costs. The computing, storage, and CDN costs brought about by the size of ByteDance are huge, so reducing costs is also an important goal.

In order to achieve these goals, it is necessary to do different types of processing on the video, including transcoding, editing, analysis, and some image processing, each of which is a video application.

The dismantling of each video application will correspond to a lot of processing power. For example, for transcoding applications, there will be some new encoders and adaptive transcoding to reduce the bit rate; Quality and so on.

All these capabilities are supported at the lowest level by a basic processing system. This system needs to meet the requirements of high availability, high scalability, and efficient operation, maintenance and development efficiency.

To sum up, the entire video processing system is based on the underlying system support, builds various video processing capabilities, and forms a variety of video applications to meet the needs of business scenarios, improve experience, and reduce costs.

Video Processing System Architecture

In order to achieve these goals, the architecture of the video processing system is shown in the figure above, and the outer layer is divided into three planes:

  • User plane : As the name suggests, it is how to call the system from the perspective of the user.
  • Control plane : It is oriented to developers, operation and maintenance personnel, and support personnel, how they control the system, and how to perform some management and emergency response actions on the system when there is a problem with the system.
  • Data plane : The system generates massive amounts of data every day. On the one hand, these data can be analyzed to guide the optimization of the system. On the other hand, it is also used for metering, billing, monitoring, etc.

The four layers in the middle are:

  • Service layer : mainly deals with authentication, task queue management, upper-layer template management, policy control, etc.
  • Workflow system : mainly for connecting asynchronous and distributed media processing processes in series.
  • Lambda : A highly available functional computing platform. Its biggest role is to manage the underlying massive resources, and to efficiently schedule resources and execute tasks.
  • BMF : It is a dynamic multimedia processing framework, the goal is to plug-in management of all multimedia processing atomic capabilities, and then improve the scalability of the system and the efficiency of development and operation and maintenance.

The following will introduce several core layers in detail.

Service layer and workflow system

System Service Layer Introduction

The service layer has several important components.

  • Service gateway : It can perform traffic scheduling across computer rooms and authentication of some interfaces, including current limiting at the interface layer.
  • Management service : It has two functions. First, it manages all metadata of the entire video processing system, including task queues, templates, and workflow information. In addition, it triggers the execution of the underlying workflow and manages the entire work. The lifecycle state of the stream.
  • Elastic queue : resources on the business side can be isolated. The functions it implements include: resource configuration of queues (QPS of tasks, maximum number of concurrent tasks MRT), queue management, and management of elastic resources.

Introduction to Media Workflow

Below the service layer is the engine of the media workflow, which organizes a series of video processing processes in the form of DAG.

For example, after uploading a video on Xigua Video, it needs to extract its cover, transcode the video without watermark, and transcode in various gears.

These are all processes for processing video, and each process is a fine-grained task.

An effective way is to organize these individual processes to form a workflow.

What problems can workflow solve?

  • The first is that it solves the calling process of complex businesses . Without a workflow, multiple calls would be made to process a single video.
  • The second is to be able to better manage the dependencies of the video processing process . In the actual processing process, there will be dependencies between the previous and subsequent processes. For example, in the process of image quality enhancement, the original film needs to be enhanced first, and then ordinary transcoding is performed, or the video is pre-encoded through the function of segmented transcoding. Slice, then transcode each slice, and finally stitch them together. All of these can be achieved through workflow.
  • The third is to provide high-availability capabilities such as task timeout and error retry , which reduces business usage costs.

Let's take a brief look at the internal structure of the workflow.

The workflow mainly includes the following modules:

  • Gate : Handle traffic scheduling, including authentication functions.
  • Engine : Manages the state of all workflows.
  • Scheduler : A workflow contains many nodes, and Scheduler can perform fine-grained task scheduling for each node.
  • VWorker : It is the glue layer between the upper layer and the lower layer. It will convert some upper-layer templates with partial business attributes into a bottom-layer parameter that can actually perform functional tasks.

The green part in the middle of the figure above is the engine of the entire workflow.

The upper layer is the service layer, and the lower layer is the function computing platform to be introduced later.

task execution

The video processing system is an offline processing system, and each task will be executed for tens of seconds, minutes or even longer.

At this time, the most important thing is to ensure that every task that comes over can be executed in the end and be consistent in the end.

So for the system, there needs to be at least once guarantee.

The second is the requirement of task idempotency.

Task idempotence has two meanings:

  • First of all, no matter how many times this task is performed at any time, the final result is the same, and it is transparent to the business side.
  • The second is that within a certain period of time, if the same video is submitted for the same process many times, then a de-duplication mechanism is required, which is only executed once. For the caller, this process should also be transparent, which can better improve the efficiency of the system in some scenarios.

In order to ensure task idempotency, we have done a lot of work in video meta information association and video storage, and at the same time to ensure at least once, we have done a lot of timeout detection and retry at the workflow and node levels mechanism.

Difficulty in task execution 1: quick response and recovery

Downstream of the video processing system involves computing resources and storage resources.

Once there is a problem with computing resources and storage resources, it is difficult to have a perfect solution that is not aware of the upper-level business. What we need to do is to avoid losses as much as possible and minimize the impact on the business.

There are two more important measures here:

  • Multi-level current limiting : Current limiting is a common method, but the difference in video processing is that there will be a task screening process, which needs to ensure that all important tasks are prioritized within limited resources.

    For example, assuming that the underlying computing resources suddenly become half of normal, how to reduce the impact on the business?

    First of all, at the workflow level, it is necessary to delay some workflow tasks that are not sensitive to task delay, which requires some policy presets;

    In addition, in the same workflow, it is necessary to configure the priority of different nodes. For example, if a video needs to be transferred out of five gears, there may be two gears that have the highest probability of consumption. The first gear is turned out first, and the other gears are processed with delay. This overall involves a capability of multi-level current limiting and current limiting policy configuration.

  • Batch reroll : what does it mean? For example, suppose a classmate at the bottom launched a problematic function yesterday, but it was only discovered today. What needs to be done at this time is to screen out all the videos that have been affected by this function since it was launched yesterday, and quickly repost it without affecting the currently running business.

    There are two issues involved here.

    The first point is how to accurately pick out all of this batch of videos from any point in time to another point in time.

    The second point is fast retransmission, and it cannot affect online business. Therefore, there needs to be a separate subsystem here to be responsible for the search and retransmission of the overall batch tasks.

Difficulty in task execution 2: system dimension

From the perspective of the system, we have also done some work, including redundant backup of middleware, detection of downstream abnormalities, etc. When some instances are found to have problems, these instances will be fused and removed. Secondly, the system has a relatively complete traffic switching solution, because the system has passed the test of many large-scale activities, and also has a complete pressure test and plan, which are also very important for the high availability of the system.

Function computing platform

The workflow system is described above. Let's introduce its underlying function computing platform.

First introduce the concept of function.

A function corresponds to a node in the media workflow, and it also corresponds to a fine-grained video processing task. More generally, it corresponds to a program that can be executed .

What capabilities does this functional computing platform need to provide?

  • First, and most importantly, it is to provide video processing programs with the ability to scale horizontally on a large scale, so that a video processing program can easily and stably serve online businesses on a large scale.
  • Secondly, this platform needs to manage relatively large resources, which are of various types, and can provide efficient resource scheduling capabilities.
  • The last is the high-availability capability that can handle various abnormal situations and disaster recovery.

The figure above shows the basic architecture of this functional computing platform.

The left part of the figure is a control plane. Developers can develop a function and register it on the function computing platform through the management UI.

The right side of the figure is the flow of the entire function call. This process will first pass through a gateway of the function computing platform, go to the scheduling at the cluster level, and then go to a single cluster. Inside the single cluster is a central scheduling system Order developed by us.

Order has a central scheduler that will schedule this task to a specific node for execution. This node will pull the executable package of the entire function, and then execute this function.

High Availability: Multiple Clusters

At the multi-cluster level,

  • First of all, we have implemented multi-cluster disaster recovery and one-click switching of traffic;
  • Secondly, we will also automatically adjust the traffic according to some preset configurations.

The above figure is a simple schematic diagram of multiple computer rooms.

The left and right sides of the figure are a computer room, and there are multiple clusters in each computer room.

There will be a cluster-level scheduler module in each computer room, and there will be another module between multiple computer rooms. This module is responsible for synchronizing the resources of each computer room, including the total amount and usage of resources.

High Availability: Single Cluster

Our single cluster is a central scheduling system, with a central scheduler called Server; there will be an execution unit called Client. Server is multi-instance, stateless, and can be upgraded smoothly and dynamically.

There will be status detection between Server and Client, as well as mechanisms such as fault node fusing and task retry.

Under normal circumstances, the server will judge whether a node is alive or not through the heartbeat. In addition to this, the server will also observe the overall state of the node.

For example, whether a task has more timeouts, or the failure rate of the task is relatively high, etc., when this happens, the fuse strategy will also be executed on the node.

Control Surface - Service Governance

As mentioned above, the function computing platform is divided into several layers, from the upper gateway layer, to the scheduling of the cluster, to the scheduling inside the machine.

Each layer is a multi-instance service. Therefore, each upstream will have some exception detection + removal for the downstream, that is, all components will have a single point of exception handling. In addition, there are some middleware fuse strategies.

Dynamic Multimedia Framework BMF

The full name of BMF is ByteDance Media Framework. It is a multimedia processing framework developed by ByteDance.

The reason why we developed a video processing framework by ourselves is because we found that some traditional video processing frameworks have some limitations.

  • First of all, traditional frameworks are generally developed and expanded using C/C++, and the threshold for expansion is relatively high, and recompilation is required after expansion, which is a very troublesome thing in a large system.

If this framework is developed and maintained by many people, its dependencies will increase, which will greatly reduce the efficiency of development and operation and maintenance.

  • The second point is that traditional video processing, such as video transcoding, has a relatively fixed process and can be supported by general processing frameworks. But for some more complex scenarios, such as video editing or AI analysis, the traditional framework itself has some limitations in flexibility.
  • The third point is that the performance of the traditional framework itself will have some bottlenecks. Taking ffmpeg as an example, the filter graph is executed in a single thread. If a GPU filter is placed in the filter graph, its execution efficiency will be greatly reduced, and the GPU occupancy rate will not be very high.

In order to solve the above problems, we developed the BMF multimedia processing framework. Its goals include:

  • Reduce the cost of video application development and standardize application development.
  • Various complex application scenarios are supported through a set of frameworks. From the perspective of the framework itself, it has relatively high flexibility.
  • Through this framework, all atomic capabilities of video processing are modularized, and dynamically managed and reused, so as to solve the problem of large-scale collaborative development; at the same time, these capabilities can be better reused in different scenarios and businesses superior.
  • Mask underlying hardware differences. Businesses now increasingly use different heterogeneous hardware, such as GPUs, and we hope that this framework can natively support these hardware.

The figure above is the overall architecture of the BMF framework.

The top layer is the application layer. Each block on the application layer is a video application, such as the aforementioned video transcoding, video editing, image processing, and so on. The lower layer is the module layer, and each module is a fine-grained atomic capability of video processing. For example, codec the video, or perform ROI detection and so on.

The application layer and the module layer are connected in series through the framework layer in the middle.

At the center of this framework layer is a core engine. This engine provides a set of relatively general and concise streaming interfaces, enabling developers to easily build video processing applications. The interface supports multiple languages, including C++, Python, and Go. Next, it will provide a relatively complete SDK for module development, and also supports C++, Python, and Go.

Around the core engine, we have made some related services and tool sets. This service is mainly used to manage module versions and dependencies.

One of the biggest benefits of this architecture is that it divides developers into a better division .

Developers of different modules can only focus on the development of their own modules and choose the language they are familiar with.

After the module is developed, the whole module can be registered into the system. The application development of the upper layer supports the business. The business does not need to know how the underlying module is implemented, nor does it need to know what language the module is implemented in. As long as the interface provided by this framework is used, the module can be seamlessly connected and integrated. use.

The figure above further describes the dynamic development model of BMF.

For example, in actual situations, algorithm developers develop algorithms for video processing.

First of all, this algorithm will be sent to the algorithm optimization classmates for optimization. After optimization, the entire algorithm forms a model.

Next, the algorithm optimization students will register this model to the system, and the module development students will package this model into a specific module, which is also registered in the system. This module is a specific atomic capability.

Next, the function developer, that is, the business development classmate, can connect the modules together into a specific video processing application, make a function and register it on the function management platform, and then go online in gray scale.

You can see that the division of labor of each team in the whole process is very clear, and the efficiency of independent development and collaboration is very high.

Moreover, the atomic capabilities of all modules in this process can be reused. The process does not involve any compilation dependencies, and everything is done dynamically.

Billion-level video processing macro flow

The figure above is an example of the complete process of video transcoding. When a user uploads a video, the video will first enter the server's storage, and a transcoding process will be triggered at this time, that is, a workflow task will be submitted. This task will first pass through the transcoding service, and then be put into the elastic queue In the next step, tasks are dequeued from the elastic queue and entered into the workflow engine for execution; the workflow engine will disassemble the workflow into fine-grained tasks, and then send them to the function computing platform for execution. A function will be built using the BMF dynamic development method introduced earlier. Finally, when all fine-grained node tasks are executed and the entire workflow is completed, the process of transcoding or video processing is completed, and then returns step by step.

Finally, review some key points of this article:

First of all, the video processing system, the most important requirements include: high availability, that is, the stability of the system; high scalability, when supporting a lot of business scenarios, the scalability of the system will also affect the overall high availability It has a great impact; the efficiency of development and operation and maintenance.

The overall architecture can be summed up as three core parts: media workflow, functional computing platform, and dynamic multimedia framework BMF. In terms of high availability, there will be task idempotence, multi-level current limiting, and batch retransmission at the service layer. At the platform layer, there will be multi-computer room and multi-cluster flow switching strategies, redundancy within a single cluster, upstream and downstream anomaly detection, etc. Finally, although the underlying dynamic multimedia framework does not directly improve the high availability of the system, it improves the scalability of the system and the efficiency of development and operation and maintenance, so it also plays a very important role in the system.

In the future, the system will develop in a more intelligent direction. We hope to build an execution platform for distributed scheduling. Users only need to pay attention to the processing flow. The splitting of the process, resource scheduling, and how to execute are all determined by the platform.

references:

https://blog.csdn.net/weixin_37604985/article/details/132179317

https://zhuanlan.zhihu.com/p/381259391

https://blog.csdn.net/csdnnews/article/details/117915142

So, the above is the "textbook" answer:

Combined with the Byte solution, let’s go back to the previous interview questions:

  • Short video system, how to do system architecture?
  • Short video APP, how to do system architecture?

The above solution is the perfect answer, the "textbook" answer.

In the follow-up, Nien will give you more and more exciting answers based on industry cases.

Of course, if you encounter such problems, you can ask Nien for help.

Video preview: 33 chapters, 10Wqps basic user platform architecture and practical operation

In order for everyone to get high-end offers and architecture offers, it will be released soon:

  • "Chapter 31 Video: 1000Wqps ID Component Architecture and Practice".
  • "Chapter 33 Video: 10Wqps Basic User Platform Architecture and Practice".

And provide supporting resume templates to help you rebuild and upgrade the highlights of your resume, and finally help you enter a big factory, build a structure, and get a high salary.

recommended reading

" Burst, relying on "bragging" to live in JD.com, with a monthly salary of 40K "

" Too fierce, relying on "bragging" to live on SF Express, with a monthly salary of 30K "

" It exploded... Jingdong asked for 40 questions, and 50W+ after passing "

" Questions are numb...Ali asked 27 questions at the same time, and 60W+ after passing "

" Baidu madly asked for 3 hours, Dachang got an offer, the guy is so ruthless!" "

" Are you too ruthless: face an advanced Java, how hard and ruthless it is "

" One hour of byte madness, the guy got the offer, it's too ruthless!" "

" Accept a Didi Offer: From the three experiences of the guy, what do you need to learn? "

"Nin's Architecture Notes", "Nin's High Concurrency Trilogy", "Nin's Java Interview Collection" PDF, please go to the following official account [Technical Freedom Circle] to take it↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/132379306