Live broadcast + X - a new trend in the live broadcast industry

  //  

Editor's note: Human beings are constantly pursuing feelings and experiences, and audio and video technology is developing rapidly. Audio and video services are strongly needed by various industries with an unprecedented trend. Today, live broadcasting is already a term familiar to everyone. The live broadcasting business and ecology, as well as key supporting technologies, are continuously evolving and iterating, and are full of vitality. This LiveVideoStackCon 2023 Shanghai Station invited Huawei Cloud Lu Zhenyu to share with you how to make "old trees grow new shoots" in the live broadcast industry.

Text/Lu Zhenyu

Edit/LiveVideoStack

Hello everyone, I am Lu Zhenyu from Huawei Cloud. In order to not only cover the sense of participation of the vast majority of listeners, but also allow everyone to gain a lot, I choose to share the theme of "Live + X", mainly introducing new trends and gameplay in the live broadcast industry.

01

Live Streaming Industry Trends

3521afd3fa6e6cc275fc6ee9b04e8cdf.jpeg

Our team has been summarizing the past in order to predict the future or grasp the future as much as possible. The core points are as follows:

1. Multimedia technology, especially live broadcast technology, is not the technology stack of acquisition, encoding, transmission, playback, and end-side light rendering that people currently know. Live broadcast or streaming media is a comprehensive and constantly evolving system of multiple technologies :

①Development on the acquisition side : From the paintings of ancient people, to clearer photos, to the virtual scene production technology of clearer and more dazzling movie blockbusters.

②Development of video distribution and consumption methods : At first, TV used wireless signals to transmit video, and then the Internet appeared, and live video appeared. The name "streaming media live broadcast" is the technology stack that everyone recognizes at present composed of the Internet, camera collection, and PC.

③Development of end-to-end : from cinema TVs, to PCs, mobile phones, to various immersive devices. As far as I know, China Mobile is also promoting the development of the naked-eye 3D industry. A screen that is no different from a mobile phone or a Tab can transform content into naked-eye 3D at any time, and the effect is amazing.

From generation to transmission and consumption to terminals, they are developing in an integrated technology through continuous evolution. Our team predicts that there will be revolutionary changes in the way video is generated in the future, and the next two decades will be an opportunity for all practitioners to grasp. In the past, videos were shot and generated through 3D modeling and rendering. In the future, we should use AI more to generate content. In addition, transmission and distribution methods will also undergo major changes, which will transcend various time and space constraints, and will no longer just watch videos in one direction. When content display devices become immersive, people will enter the space in the form of digital humans, and video Generate more interactions.

The above is the first point for us to grasp the entire live broadcast and multimedia industry. Everyone should not be stuck in the present because of the difficulties encountered by the industry today, but should look for opportunities in the future.

2. The driving force behind continuous technological progress is the experience of consumers, and consumers’ pursuit of experience is endless. Generally speaking, what we often do now is to imagine that the emperor also eats ten steamed buns and uses a golden hoe Hoeing the ground, but in the future, once there is technological innovation, people's needs will be upgraded accordingly. We are very happy watching 4K blockbuster movies at home, so what experience will children pursue at our age, so we need to learn more about the thoughts in children's hearts and realize their dreams.

The integration of technology and the continuous upgrading of experience are the driving force for industrial development, and it is also the starting point for all business and strategic design of HUAWEI CLOUD.

65bf923d3fc09420c1412515a3c63f58.jpeg

Next, narrow down the scope. From the perspective of 2023, we can see that live broadcasting is constantly breaking through the limitations of time and space. Its scope of application and scenarios are increasing. There are two main development trends:

1. Live broadcasting in China, especially in the Mainland, has gradually changed from a very mature system to a globalized infrastructure and capabilities.

In China, this system is very successful, so it is a dimensionality reduction attack in many places overseas. We have encountered an interesting case. A user was very interested in HUAWEI CLOUD's low-latency live broadcast and thought it was suitable for his business experience, so we tested its business, but the result was not satisfactory. After finding out the reason, we found that the anchor is in Ukraine, the audience is in Singapore and other Southeast Asian countries, and the client's requirement for us is 500ms delay. If you look at this matter with domestic inertial thinking, then our so-called "low-latency live broadcast" is fake, and it takes more than ten seconds to push the stream from Ukraine to Singapore. This inspired us to do a lot of optimization to turn it into a live broadcast service across the world, which is to break through the limitations of time and space. Of course, in the end, the delay was controlled at 500ms. This is what I mean by continuously extending the time and space scope of services.

In addition, many live broadcast scenarios are becoming more and more popular. For example, live broadcasting is the most effective transfer method for the recruitment industry. In addition, there are cross-border delivery of goods, and the recently popular barrage games, which are all enriching the gameplay of live broadcasting. Live broadcasting is not only the infrastructure of the entertainment industry, but also constantly conquering cities and helping other industries improve productivity.

8ba94b7646b9d8363569f59b85489871.jpeg

2. Go beyond "reality", live broadcast from filming content to generative live broadcast.

On the left is the traditional video mainly shot by PGC/UGC, the content is from the same perspective, and the distribution is 1/N. So the technology we are discussing is CDN cost reduction, which consumes a little storage at most.

Now there are a lot of generative content, mainly divided into two technical systems:

1. Generative real-time rendering, which is generated by a computer graphics rendering engine. We are going to use triangles to model all the objects and people in a scene. How to express and turn it into high-quality content that can be accepted by human eyes contains a lot of GPU computing power, and the interaction between the audience and the content is 1:1.

2. AIGC generates content generation, the most typical application is digital human live broadcast, including the computing power consumption of a lot of AI reasoning, and it is also 1:1 from the viewer's perspective.

Will AI replace real-time rendering? I don't think so, I think CG and CV will be deeply integrated, complement and promote each other.

The reasons are as follows: First of all, if AI reasoning achieves 60fps/s in the angle of generating video, it is a live broadcast. However, it is currently impossible to achieve 1080p/60fps in complex scenes such as 3D scenes. Secondly, graphics-based rendering is irreplaceable in some scenarios. For example, industrial scenarios require people or models with an accuracy of 1cm, which must be processed by graphics rendering, which cannot be handled by AI methods.

Therefore, CG and CV technologies will be deeply integrated for a long time in the future. Only when they are done well at the same time can we be able to make good generative content.

Another big change challenge is does the content have to be distributed 1:1? What kind of bandwidth computing power is needed? How to popularize it on a large scale? This requires the use of some technologies for storage and calculation, etc., so that future distribution will be in the state of a completely point-to-point RTC and cache-accelerated CDN. The above are not looking forward to the future, not futures, but spot goods, which is the ongoing practice of Huawei Cloud.

2fbf7bd57da55e73b32c10daa60eeb2c.jpeg

Live + X faces many challenges in its evolution, and the scope of time and space continues to expand. How can we serve global users well? How to meet their requirements for latency security and compliant experience? How to truly be far ahead, continue to upgrade the industry, and let the generative live broadcast business bring value quickly and on a large scale? How to balance the challenges of exponential growth in computing power consumption, occupancy of peer-to-peer bandwidth, and latency? How to lower the complex technical threshold?

02

Break through time and space constraints

Next, we will introduce the related practices of HUAWEI CLOUD.

c81fe815061003c90ead00b23f5294ea.jpeg

There are great opportunities to break through the constraints of "time and space".

In fact, I don't quite agree with the term "business overseas". Regardless of whether Chinese people go overseas, overseas business is there. I pay more attention to the local business. The income space of live broadcasting is very large. This refers to the space of live broadcasting service in a narrow sense. It can be seen that from the perspective of business maturity, Asia-Pacific is one step ahead, because in Southeast Asia, the system we are already familiar with is relatively complete, followed by North America and Europe. At present, South America is growing very fast, and Kuaishou has successfully launched in South America.

Changes are not new overseas. YY went public in 2012 and was a pioneer in China. When I was working for Huawei in Southeast Asia in 2013, a colleague held a live show in Indonesia. I didn’t understand it at the time. Because live broadcasting only had entertainment business in Southeast Asia at that time, but now live broadcasting is a productivity tool in these regions. In the Asia-Pacific region, everyone has accepted live cross-border e-commerce, live lectures, global competitions, whether in North America, Asia-Pacific, or Europe, there are local game live broadcast platforms that compete with Huya Douyu. Generative scenarios such as virtual concerts and virtual communities.

9c7f88271d510617aa213da1717736f7.jpeg

Taking Southeast Asia as an example, the scale of live streaming e-commerce has reached US$19 billion this year , and the closed-loop commercial infrastructure that empowers the industry with live streaming as a production tool is fully mature, including payment and basic networks. We compared the total live broadcast bandwidth of all manufacturers in China and Indonesia and divided it by the population. The result is that the current scale of live broadcast in Indonesia is equivalent to the level of China in 2018, and the domestic live broadcast in 2018 can be used as you like. As long as users need it, CDN Suppliers will expand. After the network infrastructure is in place, Huawei has built 5G base stations and fiber-to-the-home for these countries. In this way, there is a sufficient commercial closed loop, so that live broadcasting is not only used for watching TV or show content, but also becomes a productivity tool. room for development.

Not only is Chinese capital going overseas, Huawei Cloud pays more attention to how to find and serve local customers in a down-to-earth manner.

7366f2c51eebb281943835420d797510.jpeg

The Middle East is a booming market, characterized by a strong preference for online celebrity live broadcasts, which may be caused by the imbalance between supply and demand. The Chinese way of operating MCNs is booming in the Middle East, which is also a big opportunity. The Middle East had some problems with 5G before, but in the past two years, its infrastructure has been improving rapidly.

8bcec9dbde943e40f5ee80df03a6037b.jpeg

With so many opportunities and such a large technical space, our practice is how to help customers conduct business in the context of globalization with high quality . It is not about going to the Middle East to serve Middle Eastern customers, but how to serve the world. Customers provide services to their businesses around the world. Usually, everyone's inertial thinking is that users are in China, streaming is in China, and streaming is also in China, but in actual business, we found that this is not the case. In actual business, the viewers who push the stream, push it to the source station, push it to the CDN, and finally watch it are not in the same country, which is a big challenge.

To overcome the above challenges, we need to do the following:

1. Solve the problem of local coverage : For cloud vendors, having a large amount of resources has a natural advantage. In the past two years, HUAWEI CLOUD has newly launched Regions in many places around the world. I have been to some sites such as Indonesia, followed by Turkey, Saudi Arabia, and South Africa. At present, there are 29 Regions in the world, and they are all high-standard 3AZ. Large-scale CloudOcean/CloudSea solutions help Regions provide massive computing power and surrounding areas The connected resources add up to 83 AZs and over 2800 CDN nodes.

2ee57b1b3456ebba55239623a3791c40.jpeg

2. Solve the problem of cross-regional interconnection : After having sufficient infrastructure, how to better serve customers and solve the challenges and scenarios mentioned above? For example, the Ukrainian anchor, his needs prompted us to do something: the original ADN network was an overlay network of Overlay in the CDN, and this network had to be deployed overseas to solve this problem. It’s a little abstract to understand. The original CDN network is a tree structure. We overlay a layer 3 and 4 layer acceleration network on it, which is mesh-like. Node detection is performed between each other, and multi-factor optimal routing is carried out. This enables us to provide a low-cost, high-quality Overlay three-layer network.

The original CDN was all north-south traffic to solve the 1:N problem. Now our resources can also solve the east-west problem for everyone. If there is no link in the middle of the business, such as cloud games, cloud phones or other scenarios, as long as there is If the consumption is relatively high, and the average bandwidth price is higher than 10 yuan, this plan can be used. What is more important is to optimize the quality of north-south traffic in the context of globalization . The right side is a specific example of how to achieve a large number of nodes in the global context, and integrate various factors to optimize the route.

According to the past inertia, only one domestic network is considered, but when doing global business, dozens or even hundreds of networks are spliced ​​together. Each network has nodes of different operators, and the quality of nodes is different. The business of each customer is different, or each user has multiple businesses. The original routing selection and quality tuning parameter selection is for the scenario of one network in China, and only one set of parameters is needed for one customer and one business. The global context prompts us to isolate the parameters of all nodes of all operators in all countries and the business of all customers and automatically adjust each parameter based on big data, so that we can choose networks with different routing strategies for different tenants The plane meets the SLA requirements of the tenant's business characteristics.

c79dc72a70d64641c62ba68b6820b09e.jpeg

3. Solve the problem of low-latency distribution : everyone is familiar with low-latency technology, and the essence of low-latency live broadcast is that Google's earliest technology was productized faster and earlier in China. Of course, many overseas CDN startups have also chosen this track. Although domestically made products are very popular overseas, the limited use is a problem. The biggest limitation of the use scene comes from the ecology, which comes from the natural shortcoming of adapting to CA/DRM. Therefore, we call on the entire industry to unite and work together to find a solution to this domestic fast live broadcast and low-latency system in terms of DRM, so that the entire set of Chinese standards can take advantage.

In fact, it is very painful for us to connect with overseas users. I wonder if you still remember the MSS protocol. It is Microsoft smooth streaming, a very old protocol. Overseas, due to ecological reasons, some Samsung smart TVs require us to do MSS. Fortunately, Huawei has done it and can transplant it from the historical code base. It is very valuable to study the ecology of low-latency live broadcasting. It is the way of live broadcasting in the future. Future live broadcasting, content, interaction, and display all require low latency. Can we learn from Microsoft, Google, and Apple, and form a group in this ecology? the sound of.

b9180b691610be1c1cb070c626690a6c.jpeg

4. Solve the problem of global operation and maintenance efficiency : After having a customer network, the most important thing for global operation and maintenance is visualization. The scheduling and multi-customer matching mentioned above require visualization, mainly traffic-level and application-level visualization, Quality visualization, bandwidth visualization.

This involves a large amount of data collection on the Overlay network of ADN.

6d01508658e9b15615316d5c08e3d05f.jpeg

This is a low-latency live broadcast case. It is the Co-Watching of Turkish users and Indian users. They play small games while watching the live broadcast. The small game requires the audience to give a response within a short time after watching the live broadcast, so it is left to the collection It takes only 500ms to play, and the customer's HLS, DASH, and CMAF solutions have not been resolved. In the end, only Huawei's low-latency live broadcast or fast live broadcast can meet its business needs.

03

beyond "real"

c5dcc7208bb3917f21a6d94139162e2a.jpeg

Not long ago, we participated in a closed-door live meeting at the Hangzhou live broadcast base. The trend in the next 20 years is AIGC. Many users have clearly stated that they will all in AIGC, but AIGC is far away for many people. Whether it is still ChatGPT , or large models, mainly by publishing? We went to the digital human live broadcast base, which is not large, but the base fired all the anchors on April 15 and replaced them all with digital human live broadcast. We went to learn how to use digital human live broadcast, what does this mean for Huawei, and where are our opportunities?

After the exchange, we think that AIGC has a great opportunity. This includes a lot of application scenarios. To give a few examples, many TV stations at home and abroad are not doing well. For example, Phoenix Satellite TV has a small technical team, and they are very interested in using digital human technology for content broadcasting. Another example is the live streaming of digital people and the emergence of digital immortality-related stores in some shopping malls, which can digitize the elderly and children for emotional companionship. Although these scenarios are still very rudimentary, they collectively show that this is the future development trend. There are more scenes in the metaverse, such as online games, virtual idols, etc. This is no longer a change brought about by the entertainment industry or streaming media technology, but has become a factor of productivity in the entire industry.

Let me share with you two successful business cases:

1. Cross-border delivery of goods in Southeast Asia: According to incomplete statistics, more than 10,000 business routes are run every day. These include a large number of live broadcasts by digital people. The biggest driving force behind it is the overwhelming low cost. For example, selling a pair of shoes in Southeast Asia has 20 languages ​​​​in 20 countries. It is obviously unrealistic to find 20 anchors to broadcast, so you can buy a pair of shoes. There are several digital anchors, live broadcast 24 hours a day, whether it is Indonesian, Malay or other languages. I might be the only Bahasa Indonesian speaker in the session, but that doesn't matter, my digital twin can speak 20 languages. The emergence of digital humans has solved the voice barrier and filled the gap of live anchors.

2. Case in the social field : The lady in the picture turned out to be an Internet celebrity with 1.84 million fans and an annual income of 1 million US dollars. She started a virtual girlfriend service, and many people subscribed. At present, her annual income has reached 60 million U.S. dollars, which is comparable to that of Taylor Swift. If there are domestic partners who want to do similar business, we can fully support it.

3dbc7b516a614814a954e1b33520d077.jpeg

3. The case of 3D space : This is relatively simple, digital people watch the live broadcast in the Metaverse space. The picture shows Huawei's own live broadcast.

619d36af7f7a3b190e46ddf305df5bc4.jpeg

901745fc92227f6f6be1ad375cdf9e7c.jpeg

With so many scenarios and so many innovations mentioned above, some pioneers have already picked up the fruits, so how do we serve as partners in business innovation in this industry, the core issues are computing power costs and transmission costs, and how to lower the threshold of technology use.

HUAWEI CLOUD's solution is to develop the brand MetaStudio digital human live broadcast. It has three major capabilities, model making, live broadcasting skills, and one card with ten channels, respectively corresponding to the entry threshold, effective operation and rapid cost reduction. Finally, the digital human is driven by rendering (mainly corresponding to one card and ten channels).

bec638cf1f497e7540fb65005dade495.jpeg

There are three methods of model making:

1. Words generate digital humans : Wen Shengtu, graphs generate humans, and the essence is Wen Sheng digital humans. Input the description of the image, and the digital human can be generated in about 10 seconds, which can be driven later.

2. Generate a digital human from a photo : a photo can generate a digital human in one minute.

3. Digital human generation from video : It is more used in digital human live broadcast scenarios, requires five minutes of corpus, and training ranges from 3-6 to generate digital human images.

In addition to images, digital humans also need to speak. For this, we provide three timbre services:

1. Preset sounds : free robot-like sounds,

2. Tone clone : ​​The effect is a typical news broadcast tone, providing 3-5min audio, users can speak directly while recording the video, just submit it together. Although the timbre is my own, there is no emotion, and the narration is straightforward. The content is broadcast in the form of news broadcast, and the restoration degree can reach 80%, which can satisfy some scenes.

3. High-fidelity timbre cloning : In situations such as carrying goods, a timbre closer to the human voice is required, which requires high-fidelity timbre cloning. It needs 2-5 hours of studio-level sound material for training, which can realize the training of the user's speaking habits, and even pronunciation mantras, pauses, and emotional changes that cannot be detected by himself. The recovery rate is 95%, and the cost is also higher.

4a12b8f106d39c65885e10e7a1b3e29b.png

9e5fd280f22033c80fa785c2a0820aa5.png

The figure is a demonstration of the double digital human model. The time to train the model is 3-6 hours. The user only needs to select the background, enter the words, including the feedback on the barrage gift reward, and then start the digital human live broadcast.

What needs to be clarified here is that although this platform is provided, Huawei does not have the ability to iterate a large number of businesses, because our positioning is not to create a lot of people and operate them, but to help partners create cost-effective and high-quality people. ,Provide services. There are APIs in the background of all capabilities, and partners and customers are expected to use capabilities through APIs. Partners are welcome to use the platform to try it out, but we still want to emphasize that it is not Huawei's positioning to become a SAAS platform that can kill the Quartet in digital live broadcast or other application scenarios.

d0ef934a486e9bac2900a628bbd7203c.jpeg

In addition, we also provide large-scale models of digital human live broadcasting skills. There are many models, but objectively speaking, no large-scale models can be relied on. After all, it is unrealistic to rely solely on digital people to live broadcast, and use ChatGPT to make live broadcast popular. As a result, a new profession-digital human live broadcast operator was born along with the trend. Many practitioners are deaf-mute, which quickly solved the employment problem of deaf-mute. Deaf-mute people have a very strong willingness to speak, despite their physical limitations, but their advantage is that they are more focused on doing things. They become digital human live broadcast operators and with the blessing of Pangu model or ChatGPT, they can continuously optimize and iterate live broadcast skills or live broadcast operations , and find the joy of communicating with others.

56a1cff1c784fef6ae224bbf8f50bd53.jpeg

This is a large-scale demonstration of live speech skills. Enter the prompts of commodity speech skills, such as product type, original price, current price, No. 1 link, etc., and finally generate multiple results.

0c9397b54e087fbcf2a33f52c761f2d4.jpeg

HUAWEI CLOUD provides a full-stop service in terms of generative content, and we have self-developed localized computing power . Huawei Ascend chips have solutions for generative live broadcast and future scene reasoning in terms of computing power, and will continue to optimize these scenarios in depth, providing capabilities that cannot be achieved in the Nvidia series, such as one card and ten channels required for digital human generation. wait.

The advantage of HUAWEI CLOUD is to help users quickly realize generative live broadcasting through the rising computing power, the model layer above the computing power, the digital human algorithm, the entire rendering, streaming, and a complete set of solutions for the live broadcast platform.

I hope this sharing can pass on HUAWEI CLOUD's knowledge and understanding of the industry and the opportunities it sees to everyone. The above is the sharing of this time, thank you all!


7b7fa5dbf427e9f0f3049a8f57bc8293.jpeg

LiveVideoStackCon is the stage for every multimedia technician. If you are in charge of a team or company, have years of practice in a certain field or technology, and are keen on technical exchanges, welcome to apply to be the producer/lecturer of LiveVideoStackCon.

Scan the QR code below to view lecturer application conditions, lecturer benefits and other information. Submit the form on the page to complete the instructor application. The conference organizing committee will review your information as soon as possible and communicate with qualified candidates.

3f12aebc1c217fb3f2b76fdedc3b2b91.jpeg

Scan the QR code above 

Fill out the Instructor Application Form

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/132073916