WebRTC | Audio and video live broadcast client framework

        The end-to-end communication interaction technology can be decomposed into the following technical difficulties: client technology, server technology, global device network adaptation technology, and communication interaction quality monitoring and display technology.

1. Live audio and video

        Audio and video live broadcast can be divided into two technical routes: one is real-time interactive live broadcast represented by audio and video conference; the other is streaming media distribution represented by entertainment live broadcast.

        Interactive live broadcast mainly solves the problem of people's remote audio and video communication, so its advantage is strong real-time performance, and the delay is generally less than 500ms; while entertainment live broadcast mainly solves the problem of large-scale distribution of audio and video, so it is more important in large-scale distribution. Advantages, but the real-time performance is relatively poor, usually with a delay of more than 3s.

1. Common live broadcast technology

Common live broadcast technology

        In the above table, only WebRTC technology is used for real-time interactive live broadcast, while several other technologies are used for entertainment live broadcast.

        HLS is based on HTTP. It first slices the media stream (file), and then transmits it through HTTP. The receiving end needs to buffer the received slices before the media stream can be played smoothly. ( Actually, RTMP was the only option for entertainment live streaming at first, but later, Apple announced that it would no longer support RTMP and launched its own solution, HLS, which eventually led to the demise of RTMP. )

        Changing RTMP to HLS requires a high cost, so someone proposed the HTTP-FLV scheme, that is, the transmitted content still uses RTMP format, but the underlying transmission protocol is replaced by HTTP. This scheme can ensure that its real-time performance is better than HLS. It can also save the cost of upgrading, so it is also welcomed by all parties. However, the scalability of HTTP-FLV is relatively poor, so it is only a temporary solution.

        Although the HLS solution is good (it is used by a large number of users), other companies have similar solutions, which makes each live broadcast manufacturer have to write multiple sets of codes, which is time-consuming and laborious. Therefore, FFMPEG launched the DASH solution, which is similar to HLS and also transmits data in slices. Eventually, this solution became an international standard, so that live broadcast manufacturers can realize slice transmission by writing a set of codes.

2. Current status of audio and video live streaming

        The vision of WebRTC is to enable end-to-end real-time audio and video interaction between browsers quickly and easily. The combination of real-time interactive live broadcast and entertainment live broadcast technology has become the mainstream technical solution of the live broadcast server.

        There are two important trends in audio and video live broadcast technology: one is the combination of real-time interactive live broadcast technology and entertainment live broadcast technology; the other is that WebRTC is already the standard of live broadcast technology, and everyone is actively embracing WebRTC.

2. Self-developed live broadcast client architecture

1. Five basic modules

        A simple live broadcast client should at least include five parts: audio and video acquisition module, audio and video encoding module, network transmission module, audio and video decoding module and audio and video rendering module.

        To refine it, the audio acquisition module is separate from the video acquisition module, and the audio codec module is also separate from the video codec module. That is, audio uses one processing pipeline, video uses another processing pipeline, and they do not intersect. In audio and video processing, we generally call each channel of audio or video as a track.

2. Support cross-platform

        In addition to the general five modules mentioned above, cross-platform issues also need to be considered. Only when audio and video interconnection can be realized on each platform, can it be called a qualified audio and video live broadcast client. Taking audio collection as an example, on different platforms, the system APIs used when collecting audio data are different. CoreAudio is used on the PC side; the system API used on the Mac side is also called CoreAudio, but the specific function names are different; AudioRecord is used on the Android side; AudioUnit is used on the iOS side; PulseAudio is used on the Linux side.

3. Plug-in management of codec

        For audio and video live broadcast clients, we not only hope that it can process audio data and video data, but also hope that it can share screens, play multimedia files, and share whiteboards... Even for audio and video processing, we also hope that it can support multiple A codec format:

  • In addition to supporting Opus and AAC, the audio can also support G.711/G.722, iLBC, Speex, etc.;
  • In addition to supporting H264, video can also support H265, VP8, VP9, ​​AVI, etc.

        G.711/G.722 is mainly used in the telephone system. If the audio and video live broadcast client wants to connect with the telephone system, it must support this codec format; Opus is mainly used for real-time calls; AAC is mainly used for music applications. Such as piano teaching and so on. Implementing plug-in management can easily enable the live broadcast client to support as many codecs as possible.

4. Focus on other issues

  • Audio and video out of sync problem

        After the audio and video data is transmitted through the network, due to problems such as network jitter and delay, the audio and video may be out of sync. In this regard, an audio and video synchronization module can be added to the audio and video live broadcast client to ensure the synchronization of audio and video.

  • 3A question

        3A refers to: Acoustic Echo Canceling (AEC), that is, echo cancellation; Automatic Gain Control (AGC), that is, automatic gain; Active Noise Control (ANC, also known as Noise Cancellation, Noise Suppression), that is, noise reduction.

  • Audio and video real-time problems

        TCP guarantees the quality of network service by sacrificing real-time performance. Therefore, in order to ensure real-time performance, UDP should be preferred for real-time live broadcast under normal circumstances.

3. WebRTC client architecture

         As can be seen from the WebRTC architecture diagram, it can be roughly divided into four layers: interface layer, session layer, core engine layer and device layer.

  • The interface layer consists of two parts: one is the Web layer interface; the other is the Native layer interface. In other words, you can use a browser to develop an audio and video live broadcast client, or use Native (C++, Android, OC, etc.) to develop an audio and video live broadcast client.
  • The main role of the Session layer is to control business logic, such as media negotiation, collecting Candidate, etc.
  • The core engine layer includes more content. In a big way, it includes audio engine, video engine and network transport layer. The audio engine layer includes NetEQ, audio codecs (such as Opus, iLBC), 3A, etc.; the video engine includes JitterBufer, video codecs (VP8, VP9, ​​H264), etc.; the network transport layer includes SRTP, network I/O multiplexing Multiplexing, P2P, etc.
  • The device layer mainly deals with hardware, and it involves: audio collection and playback on each terminal device, video collection, and network layer.

4. Comparison between self-developed system and WebRTC

 

Guess you like

Origin blog.csdn.net/weixin_39766005/article/details/132149265