Cloud Integration Boosts Experience Upgrade and Business Innovation

With the development of audio, video and AI technology, a more extreme user experience and richer interactive gameplay have become the key to building core competitiveness for each platform while meeting the basic user experience and needs. LiveVideoStackCon 2022 Beijing Railway Station invited Zhang Peilei, the business leader of Volcanic Engine Video Cloud in South China, to introduce how the audio and video cloud integrated solution can help user experience upgrades and business innovation based on the accumulation of the audio and video business practice of the festival.

Text/Zhang Peilei

Edit/LiveVideoStack

Hello everyone, I am Zhang Peilei, the business leader of the South China region of the Volcanic Engine Video Cloud commercialization direction.

Volcano Engine Video Cloud released an integrated audio and video cloud solution last year to help audio and video companies build a one-stop audio and video capability for Douyin.

In traditional video viewing and tuning scenarios, when encountering problems such as freezes and black screens, the line adjustment is usually relatively coarse-grained. For example, it can only be adjusted for cloud CDN lines, specific areas, or operators, and the accuracy is difficult to control. After long-term experience accumulation within the group, we have formed a complete set of cloud-integrated linkage and coordination system. It consists of terminal SDK, namely embedded technology, strategy scheduling center, and training engine, which can mark user behaviors. Finally, it will be implemented in combination with the A/B experiment channel.

-01-

Industry Trends and Challenges

Next, we will further elaborate on the practice and specific advantages of cloud integration in commercial scenarios.

First of all, why do cloud integration?

8f5b12815130878b00f6fde28c62f2bd.png

With the development of mobile Internet transmission, the audio and video industry has developed smoothly from the early stage to the pursuit of the ultimate experience, which is mainly reflected in four aspects:

  • The pursuit of smooth viewing, the loading of the first frame and the smooth switching of feed streams are increasingly pursued in short video and live broadcast scenarios. In our previous sharing of short video practice, users have no perception within 100ms, and there will be an inflection point at 210ms. Once the user leaves more than 300ms, the speed will increase rapidly.

  • In the pursuit of clarity, Douyin also tried super-resolution on a large scale during the World Cup, and got good market feedback. Once you try the super-resolution visual experience, you will feel very boring when you look at ordinary 1080P .

  • Interactive real-time performance. In Douyin’s daily live events and interactive live broadcasts, we have used low-latency live broadcasts and RTC on a large scale. Low-latency live broadcasts ensure the timeliness of viewing, and the interactivity of RTC ensures interactive The fun of live entertainment.

  • Immersion, including sound and vision, is currently launching its own VR/AR solutions. We are also constantly optimizing and upgrading PICO. In RTC scenarios, such as sensing the position of teammates in the game, we use spatial audio technology for positioning.

    490d1f57d4684203755c3f536cd7549e.png

In addition to the direction of the industry, the development of different businesses will also go through different stages.

For the business in the start-up period, manpower investment and purchase volume are the key stages. Developers usually face the problem of patching together multiple modules at this stage, such as RTC, beauty, editing, and SDK combination when multi-module docking is required at the same time. Very complicated, the docking cycle is in months.

Once the business is stabilized, the main goal here will become the optimization of experience. Douyin is facing the challenges of experience optimization such as second opening and first frame optimization. How to balance the relationship between business experience and business growth is also an important issue.

In the mature period of business development, the goal will be more to reduce costs and increase efficiency. On the basis of reducing internal operating costs and increasing the efficiency of daily maintenance and operation and maintenance, new business directions will be explored at the same time.

-02-

Cloud-integrated business practice

In order to adapt to the development direction of the industry and business, how to build your own business with the help of cloud integration?

f39431b21674bac152d7c34cff691355.png

The figure shows the stages of the entire cloud integration:

For the business in the start-up period, that is, the stage of building a business from 0-1, it provides a one-stop full-link solution, covering content production, service, and consumption.

  • On the production side , we provide short video-based shooting and editing, that is, the ability to cut the same style on Douyin. Real-time audio and video collection and streaming based on live broadcast, as well as secondary rendering effects of video, and audio feature capabilities. The final collected audio and video is transmitted to the cloud

  • The server and cloud service capabilities are divided into three scenarios:

    • In the RTC scenario, audio and video are combined and recorded into the room, which greatly reduces the user's cost of use; in some online education customers, the streaming of the RTC room is pushed to the live broadcast, and we provide the server-side retweeting scenario. For the optimization of the link, we have optimized the weak network, including the UDP anti-packet loss strategy.

    • In the live video scene, time-shifting, transcoding, recording and distribution in the cloud can be integrated

    • On-demand scenarios, with the ability to manage, process, distribute and review media assets.

  • On the consumer side , with the help of the player capabilities of the Douyin Group ecosystem, after embedding the SDK, it can have the capabilities of instant opening, zero first frame, super resolution, and 265.

    f3974352ead0e5bf903e6e8b5c92033c.png

In addition to the capabilities of the business module, combined with the business scenarios of C-end users, it provides a wealth of demos, such as interactive live broadcast, e-commerce live broadcast and remote conference. The purpose is to facilitate the rapid integration of the business side, and developers can quickly build businesses based on the source code of the demo. The api in the source code covers almost all common terminal capabilities, the most common one is the connection between RTC and beauty, directly calling the api of beauty for video capture and secondary rendering, which greatly improves the integration of capture, rendering, and transmission linkage efficiency. The entire demo integrates multi-terminal capabilities, and the package capacity can also be minimized according to requirements, helping businesses to go online and verify quickly.

70a2d92b3242c44c6bbebac66aefee86.png

After the business is launched quickly, the optimization of experience will become an important goal at this stage. At this time, a QoS/QoE system is required. In our entire cloud + terminal system, there is a corresponding complete data system, which allows business developers to save the process of data reporting, collection, and management. A/B experiments can be carried out directly with the help of comprehensive QOS indicators. For example, in the optimization of on-demand scenarios, the upload protocol is divided into A/B sampling groups. Group A uses QUIC, group B uses TCP, and the end-side real-time observation of the first frame time and The indicator of the playback time, and finally make quick strategy adjustments based on the results of the A/B experiment.

Similarly, in the process of watching the resolution of the live broadcast, according to the concentration of the viewing resolution of the user, the transcoding template is modified in a targeted manner, which resolution is suitable for which terminal, and so on. A large number of QoS and QoE verifications done here will be precipitated into our cloud service strategy. The final manifestation is that the volcano engine video cloud playback strategy, transcoding strategy, and storage settlement strategy will be very suitable for the actual business demand scenario. .

96d90013737c499ebb697f4ede2ad499.png

After long-term and large-scale A/B testing, we have accumulated very rich video experience data:

  • Live broadcast delay vs playback duration: For every 4s increase in live broadcast delay, the end user's playback duration will decrease by 1 percentage point;

  • Image quality vs. playback time: After enabling 720P super resolution, the time will increase by 2 percentage points;

  • Encoding optimization vs. playback time: After using the self-developed BVC algorithm for the entire link of capture, transcoding, and decoding, the playback time increased by 5 percentage points;

  • Freezing rate vs. playing time: the positive growth of playing time brought about by reducing the freezing rate;

  • First frame vs playback duration: After the first frame time of the short video feed stream exceeds 210ms, the user retention will drop rapidly;

  • Cost vs benefit: Video rendering, while increasing costs, brings business growth.

5787ccf5a2e2f76bb7262e81a4cfb9da.png

When the business develops to a mature stage, cost reduction and efficiency increase will become the theme. We and many external customers often complain at this stage. When the terminal-side users continue to report problems of freezes and black screens, it is difficult for us to assist in troubleshooting in the cloud , at most provides client IP and operator address information, it is difficult to accurately locate the playback stage where the problem occurs.

After adopting the cloud integration solution, this problem has been solved to a large extent. The quality platform is connected with the terminal and alarm system, and the index data of the production end, server end, and consumer end are tracked in a refined manner. Based on the session granularity requested by the user, the stage where the problem exists can be traced, and targeted policy adjustments can be made. At the same time, the contradiction between quality optimization and operation and maintenance investment can be solved, and the efficiency of problem location can be improved.

Here are a few specific cases:

4b8a6e683169ec73803e0bc74fc98587.png

The first one is intelligent attribution in the troubleshooting process. According to the trend comparison of terminal playback failures, the attribution analysis determines whether it is a single-user problem or a clustered problem. Combined with the line error code in the cloud to make further trend predictions, error Operators, regional distribution, etc., compared with no means of cloud coordination and integration, the troubleshooting time is greatly reduced, and minute-level positioning is achieved.

3e099da14bcd5f4b900892fd23773a52.png

8aedc77fd53db1144505b24b9bb680fe.png

The second is single-point tracing. Based on single-user and session-level problem location, the problem link can be traced, and each link can display more detailed error information. Further subdivide the playback details, such as the duration of the first screen is subdivided into loading data, preprocessing, player preparation time, the broadcast bit rate of the entire playback link, and the download speed of the terminal. Each event is recorded when the terminal plays, and the duration of the event is also recorded.

The resolution efficiency of end-user complaints increased by 50%.

2b033654ecc6726aad82d5d087f56791.png

After the efficiency is improved, another topic is business innovation. We have integrated a variety of gameplay, including interactive special effects, AI algorithms, supporting rich materials and tools, to improve the efficiency and success rate of content creators. The most direct manifestation is in Douyin. business.

It mainly consists of several modules:

Video creation, the well-known Douyin shoots the same style, cuts the same style, various special effects, smart subtitles, can automatically generate subtitles according to the voice of the video and perform multilingual translation, BGM authorization, we have purchased a wealth of copyright materials , which is convenient for smart creators to generate background sounds.

Algorithm module, to achieve a certain rendering effect requires the accumulation of algorithms, which can be used for point recognition based on faces, gestures, body parts, and even emotions and features. , to increase the fun of the video, and there are rendering modules, including avatars, AI games, etc.

4e734fdc6d7a3ab743f60af8a96c7245.png

In the end, these businesses are applied to different innovative scenarios, such as some of our customers' experience in medical aesthetics and micro-plastic surgery, AR makeup trials based on face points in e-commerce scenarios, and design of course-related animations in online education. Special effects, increasing fun and interactivity, video beautification in the live video scene, and increasing the viewing time in the 1V1 scene.

-03-

Audio and video cloud integrated solution veVOS

Finally, based on the best practices in the above stages, summarize veVOS, the audio and video cloud integration solution of the same Douyin.

0e7dcd4475ec48d804a23b440746b106.png

The overall framework is based on the underlying cloud service. It is aimed at audio and video call transmission network in RTC scenarios, video post-processing, weak network optimization, etc., transcoding, distribution and recording in live broadcast, and media processing and distribution in on-demand.

The client packs a rich SDK package, including RTC audio and video collection, player, video production, special effects, live broadcast SDK, etc. The whole link is equipped with quality platform monitoring to ensure QoS and QoE, and the bottom layer uses the policy platform to adjust the policy corresponding to the scene.

Application scenarios range from online audio and video to social entertainment, media information and online education.

ec32b1dde1705d78ec14bd3e08978653.png

Through the comparison of the commercialization situation from last year to the present, the cloud integration solution has achieved very good results. The main advantages are summarized as follows:

  • One-stop solution, easy to get started. In traditional RTC scenarios, it takes 4-7 weeks for customers to go online. However, the one-stop solution shortens the time from access to online business to 2 weeks, and the integration of SDK is 2 days. Can be done.

  • Through perfect quality and experience monitoring, QoS and QoE can be continuously optimized. After users integrate the solution, the video opening rate per second and user playback time have been significantly improved.

  • Integrating a lot of innovative gameplay, under the background of cloud-plus-terminal, combined with multiple SDKs, we hatched short dramas to watch Douyin together, combined with beauty stickers, created a social scene with cute faces, and created many new value-added services for customers.

  • Relying on the billion-level DAU products, that is, the polishing of Douyin, many unknown problems are constantly being discovered, and the models are also compatible to the greatest extent, leading the industry.

9f078eb38a469c6074f508124fc4eb90.png

Finally, I hope that more business partners will experience the cloud integration solution, and hope that the technological innovation within the Douyin Group can help upgrade user experience and business innovation.

The above is the sharing of this time, thank you!


a0385d2a2843f5d85770a62406197970.png

Scan the QR code in the picture or click " Read the original text " 

Check out more exciting topics of LiveVideoStackCon 2023 Shanghai Station

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/130776012