Diverse Convergence: A Comprehensive Solution for Streaming Media Transmission Network

We are looking for a comprehensive solution to the "network".

The dividends of audio and video digitalization in the consumer field seem to have peaked, while industrial-level video applications have activated business models in more scenarios. At the same time, audio and video customers have also shifted from a single business requirement to a requirement for the parallel existence of multiple services.

Can the inherent network meet the emerging business formats? Is there an interval optimal solution between delay and cost? How can business upgrades and switching be time-consuming and labor-intensive? How can the stability of the network be guaranteed under controllable costs?

Can a multi-converged streaming media transmission network solve all problems?

How will the future-oriented streaming media transmission network unveil its mystery?

This article was planned and interviewed by IMMENSE, Huang Haiyu, the person in charge of the communication service of "Alibaba Cloud Video Cloud", and LiveVideoStack.

 

The new network infrastructure is looming

Is cost reduction still the biggest pain point of the network? Is "metadata" the new protagonist?

With network infrastructure upgrades, audio and video transmission technology iterations, WebRTC open source and other developments, audio and video services are booming in the consumer Internet field and are gradually penetrating into the industrial Internet field.

However, after the industry's dividend period receded, the hidden audio and video business phenomenon in the past gradually emerged.

On the one hand, "cost reduction" is a hot topic that continues to be discussed . In audio and video applications, network transmission accounts for a high proportion of IT costs. For example, in a typical live broadcast application, network transmission costs account for more than 70% of the total cost . Therefore, in the context of reducing costs and increasing efficiency, reducing network transmission costs is a common task for industry customers and cloud vendors.

On the other hand, "delay" brings more value and space . From real-time interaction at the consumer end to real-time remote at the industrial end, the delay requirements for video streams are getting higher and higher. In cloud rendering, cloud games, and digital virtual scenes, complex encoding, decoding, and transmission links are involved, and the most complex The bottleneck lies in the delay of the transmission network , but the composition and influencing factors of the network are highly complex, and the improvement of its delay is also a great challenge.

At the same time, the growth of new trends brings more challenges.

Not long ago, Apple Vision Pro was unveiled at WWDC 2023, launching its first spatial computing device, bringing the fading metaverse back to the public eye.

The picture comes from the Internet

The imagination of the future is no longer limited to the rendered video on the helmet, but also involves the interaction and synthesis of the cloud. However, the real prosperity of the Metaverse not only requires the performance upgrade of MR hardware terminals, but also depends on the iterative evolution of the streaming media transmission network.

We found that the current mass video is mainly based on the traditional shooting mode. It can be predicted that the proportion of video from rendering and synthesis will increase significantly in the future. This trend will inevitably bring massive computing and transmission requirements, as well as computing costs and Great test of transmission cost.

At the same time, this also means that the network needs to carry a more immeasurable amount of data. Among them, there are not only conventional audio and video, but also more dimensional data transmission , such as control signaling data based on remote scenarios, cloud games, The " metadata " used to control the generation of rendered video can express more complex stereoscopic scene information.

From this point of view, a powerful network that carries diverse content and provides high-performance cloud-side computing capabilities is required. As a new infrastructure, it can support future video formats.

Can "Unified" solve all the keys?

Lower cost, lower latency, more computing power combined, and more dimensional content transmission are undoubtedly the key to the trend of the transmission network, but what kind of moves can solve the problem? Maybe "Uni".

Uni comes from Unified, which means " unified ".

On the Internet, we are exploring the implementation of better "Uni" technology, real "Uni" capabilities, and creating business value brought by "Uni".

Based on a wide range of heterogeneous nodes, Alibaba Cloud Video Cloud has built a fully distributed, ultra-low latency, and multi-service support multi-converged streaming media transmission network - MediaUni .

This is based on our global real-time transmission network GRTN , with the concept of "great unification" to deepen the design of the network and realize a new upgrade of the network base .

MediaUni opens up the underlying resources, unifies the technical architecture, and uses a streaming media transmission network to realize multi-form content transmission in audio and video applications, and to meet the needs of multiple integrated services with lower costs and lower delays.

 

Delay can be free

Can any delayed business run on one network?

Thanks to the continuous breakthroughs in base capabilities and key technologies, audio and video services have achieved deepening development from traditional on-demand and live broadcasting to real-time audio and video. Digital upgrade.

Among them, "time delay" bears the brunt and becomes one of the most difficult problems to be overcome.

With one network, MediaUni can support global delay services:

From ordinary live broadcast (HLS/FLV), to ultra-low-latency live RTS based on WebRTC technology (about 1s delay), to real-time audio and video transmission (such as live broadcast, remote proctoring, etc.), at the same time, it can also support For services such as cloud rendering and real-time remote control that require extremely high latency, all services can truly run on one network .

How to realize the delayed "running away"?

Fundamentally, network delay comes from two aspects: physical delay and unreliability of IP network.

In order to combat physical delays , MediaUni shortens the "last mile" with users based on the nearby distribution of 3200+ edge sinking nodes around the world, and shortens the data transmission path so that it can perceive changes in the quality of the transmission network faster.

By deploying rendering services to nodes close to users, Alibaba Cloud Video Cloud supports Taobao’s live broadcast of the full-truth virtual interactive space “Future City” on Double 11 , realizing a virtual live broadcast with over 10,000 channels concurrently online. Time-transfer cloud rendering technology created the first metaverse temple fair , achieving the ultimate ultra-low-latency experience.

Taobao 3D virtual e-commerce space "future city"

To combat the unreliability of IP networks , MediaUni has designed a real-time perception system to realize second-level perception of node loads, link network conditions, and business-critical information, and intelligently adjust scheduling and routing strategies based on the perception data , which can better allocate physical resources and select physical links with higher quality of service.

At the same time, through continuous iteration of QoS technology , continuous optimization in terms of congestion control, FEC, multi-path transmission, etc., to combat packet loss, delay and disorder in the network to meet lower network delay.

At present, the scientific community recognizes that the human limit reaction speed is 100 milliseconds, and the average person's reaction is between 0.2 and 0.3 seconds. In a 100-meter race, starting within 0.1 seconds after the gunshot is considered a "preemptive run", and MediaUni supports The realized cloud rendering scene has broken through the end-to-end interaction delay of less than 60ms , which can be described as the "preemption" of audio and video delay.

Delay vs cost, can the network handle it?

As we all know, after the network is optimized to a certain extent, delay and transmission cost will become a pair of contradictions .

For example, within the allowed range of bandwidth, in order to combat packet loss, the protocol stack retransmits or increases FEC at any cost, effectively reducing the transmission delay, but it will result in higher transmission costs.

As the industry generally pursues "fast and faster", is there a way to get the best of both worlds between low latency and low cost?

In this regard, the essence of MediaUni is to quantify the means of reducing delay and increasing the transmission cost , and then provide the comprehensive solution with the highest ROI according to the business scenario, so as to maximize the transmission value of each bit .

➤ For ordinary entertainment live broadcasts, the interactive method is barrage, and FLV live broadcasts of about 5s can be adopted;

➤ For the live broadcast of events such as the World Cup, a low-latency live broadcast with a delay of about 1s can be selected;

➤ For e-commerce live broadcasting, the AB test found that using interactive live broadcasting with a delay of less than 1s can improve GMV to a certain extent.

It can be seen that being able to carry out refined network operations for different business scenarios and freely choose service delays with controllable costs is the real "delay freedom".

 

Diversified integration, bonus release

Business reuse is the biggest technology inclusiveness?

Relying on strong underlying infrastructure resources and long-term accumulated audio and video technology capabilities, cloud vendors have scale advantages in network services compared to other track players.

In addition, by supporting multiple services through one network, "service multiplexing" itself will continue to release technological dividends.

"Bonus" can be shown through three points:

First, the mixed operation of business drives the improvement of resource reuse rate.

The peak-staggered reuse rate of different businesses will be higher, resulting in higher reuse rates of computing resources and network resources . For example, most of the services such as conferences and remote monitoring are in daytime working hours, which is different from the "night economy" such as Internet entertainment. "Formed a good staggered peak operation.

Second, technology reuse brings about a reduction in the marginal cost of R&D.

In streaming media transmission, whether it is the transmission of audio and video or message signaling, whether it is live broadcast or real-time communication services, it is necessary to solve the routing problem based on a large number of nodes, the global fast information perception problem, and the problem of protocol stack optimization to resist weak networks .

By using one network to support multiple services, these basic technologies can be reused , so that better technical indicators can be obtained with the same R&D investment.

Third, the use of cloud products is more convenient and efficient.

Due to the support of multiple services, users can more easily upgrade services or combine new scenario-based solutions.

For example, through the Alibaba Cloud console, users only need to "upgrade with one button" , and can switch the ordinary live broadcast with a delay of about 5s to an ultra-low-latency live RTS with a delay of only 1s, or a live broadcast with a delay of less than 400ms Interactive live broadcast.

From resource utilization, research and development costs, to product use, a multi-integrated network realizes the most extreme dividend release.

There are too many businesses supported, will there be fights?

When managing the diversified businesses under "Uni", MediaUni inevitably faces many technical challenges.

Among them, the biggest challenge comes from the requirements for engineering capabilities after multi-service reuse .

After a network supports multiple services, it is necessary to solve the problem of mutual influence between services and the problem of rapid iteration of service functions.

MediaUni, on the other hand, uses a good modular design to isolate services and reduce the mutual influence of different services. At the same time, MediaUni has built a programmable capability . For some simple business needs, it can be solved through runtime programming to meet business requirements. Fast iteration of features.

Furthermore, another technical challenge of multi-service multiplexing comes from resource multiplexing , that is, different services may have different resource consumption bottlenecks. For example, the bottleneck of live broadcasting lies in bandwidth, and complex QoS policies in audio and video communications may cause CPU At this time, a more intelligent scheduling system is needed to orchestrate different services.

 

N kinds of possibilities in the future

A network that conveys the "five senses of human beings"

In the past few decades, through the efforts of generations of technical people, human vision and hearing have been better digitally presented, realizing today's low-latency, highly reliable audio and video experience. However, in addition to sight and hearing, human senses also include smell, taste, touch and so on.

It is foreseeable that immersive XR, as a future-oriented interactive form, will require complete simulation and real-time interaction of sensory information such as smell, taste, and touch at the same time, so as to realize user experience expansion and human-computer interaction, and create immersive experience for users. environment, empathy for a realistic experience.

The future-oriented streaming media transmission network will realize the efficient transmission of more dimensional data.

In the future, the network will support the interactive communication of multiple senses (such as taste, smell, touch and even emotion), and the digitization and interactive collaboration of human multi-dimensional perception will also be carried out in the same network .

Like the vibration on a gamepad, it stimulates the birth of a real metaverse that replicates the real world.

Three-pronged approach, priority layout

In order to support future multi-sensory audio and video applications , the streaming media transmission network will have three key features: millisecond-level delay, close integration with computing, and metadata transmission capabilities. Aspects of in-depth evolution.

➢ High-quality millisecond-level delay

In the audio and video full-link delay, the network transmission delay is the most difficult part with the most room for optimization.

MediaUni has achieved an end-to-end delay of less than 60ms in cloud rendering scenarios through a large number of node coverage and QoS optimization with strong media characteristics awareness, and continues to explore lower-latency transmission capabilities, which will be between 20-100ms in the future Seek the balance between the ultimate delay and quality.

➢The  computing network that can be retracted freely

The network is naturally close to users. We hope to connect distributed resources through the network to effectively promote the on-demand "flow" of computing power resources and make up for the lack of terminal computing power.

Utilizing the global wide-area distributed computing capabilities, MediaUni is realizing the unified scheduling of computing and transmission, and has deployed some real-time media processing services on the transmission network, and supports real-time start processing tasks, while reducing user delays, effectively Optimize network transmission costs for media.

➢ Metadata transfer

Metadata has increasingly become a part of audio and video products. Custom audio and video functions combined with metadata can better meet the individual needs of the scene. Especially in the online world leading to the transmission of "human five senses", the digitalization and precision of multi-dimensional senses requires the support of metadata.

In addition to audio and video transmission, MediaUni also supports more dimensional data content, such as the transmission of message signaling, which can be extended to richer IM, multi-scenario remote control, metaverse and other services. In the future, with the real opening of the multi-sensory network channel, MediaUni will exert greater energy.

Facing the future, how will MediaUni achieve new upgrades under "diversified integration"?

Stay tuned on July 28

LiveVideoStackCon2023 Shanghai Station

Alibaba Cloud Video Cloud Session

Alibaba Cloud Intelligent Senior Technical Experts Bring Speech

"MediaUni: Design and Practice of Future-Oriented Streaming Media Transmission Network"

Let's walk into the network world of "diverse integration" together!

iQIYI client "White" TV, the background uploads TIOBE's July list at full speed: C++ is about to surpass C, JavaScript enters Top6 GPT-4 model architecture leak: contains 1.8 trillion parameters, using mixed expert model (MoE) CURD has been suffering for a long time in the middle and backstage front-end, and Koala Form will be used for 30 years, and the Linux desktop market share has reached 3 % . Aiming to steal data across the board , SUSE spends $10 million to fork RHEL
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4713941/blog/10087943