WebRTC → In-depth analysis of live broadcast solutions and principles in various industries

1. Pre-knowledge

  • streaming

    • The so-called streaming media refers to the media format played on the Internet by means of streaming transmission;

    • The way of streaming transmission is to divide multimedia files such as video and audio into compressed packages in a special way, and transmit them continuously and in real time from the server to the client. The client decompresses these data through the decompression operation, and the multimedia will be like sending displayed as before

    • Transcoding is to convert the live stream into different protocol formats to support different client devices

    • Distributing to the CDN network is to solve the problems of live server congestion and slow response

The technologies used in the live broadcast include: audio/video processing, graphics processing, video/audio compression, CDN distribution, instant messaging, signaling services and other related technologies;

  • A Complete Live APP Implementation Process

    • Collection → filter processing → encoding → streaming → CDN distribution → streaming → decoding → playback interaction

  • The principle of the complete live APP

    • Live broadcast principle

      • Push the video recorded by the anchor to the server, and then the server distributes it to the audience to watch

    • live session

      • Streaming end: audio and video collection, beautification processing, 3A processing, encoding, streaming, etc.

      • Server: transcoding, recording, screenshots, pornographic identification, etc.

      • Player: pulling stream, decoding, rendering, etc.

      • Interactive system: chat room, gift system, barrage system, likes, etc.

  • The functions that a complete live broadcast APP should have

    • chat

      • Private chat, ordinary chat room, light up, push, blacklist, ban, comment closed in the live room, etc.

    • Gift

      • Gifts, red envelopes, leaderboards, recharge, in-app purchases, cash withdrawals, etc.

    • Live list

      • Follow, popular, latest, classified live user list, live room reservation, etc.

    • live broadcast

      • Recording, streaming, decoding, playback, beautification, heartbeat, background switching, desktop sharing, personnel operations in the live room, etc.

    • room logic

      • Create room, join room, exit room, close room, switch room, room manager, room user, etc.

    • user logic

      • Ordinary login, third-party login, operation of personal information, personal related list, attention, etc.

    • watch live

      • Chat, barrage, gift, live room status, user status, etc.

    • statistics

      • Business statistics, third-party statistics, etc.

    • super tube

      • Ban, hide, review, etc.

2. Analysis of live broadcast principle

2.1 Audio and video processing related

  • After video capture and processing, push the stream to the streaming media server

    • First obtain the original audio and video data from the front-end acquisition device

    • Perform additional processing on audio and video, such as audio mixing, noise reduction, etc.; perform watermark, filter, and timestamp processing on video

    • Encode the processed audio and video data in accordance with relevant specifications

    • Perform relevant encapsulation processing on the encoded data, so that different multimedia contents can be played synchronously, and provide additional processing such as adding indexes

    • Push the encapsulated content to the streaming media server through the streaming media protocol

  • flow media services

    • Collect (receive and push) streaming media content, cache, schedule and transmit and play (realize user distribution with streaming protocol)

  • viewing end

    • You only need to have a player that supports the corresponding streaming media transmission protocol

    • What needs to be paid attention to is the live stream returned by the server through what protocol, and then select the corresponding player to play it

  • Common Audio Coding Methods

    • WAV (lossless)

      • It is to add 44 bytes in front of the PCM data format, which are used to describe the sampling rate, number of channels, data format and other information of PCM.

      • Advantages: the sound quality is very good, a large number of software supports

      • Disadvantages: without compression processing, the memory usage is relatively large

      • Applicable scenarios: intermediate files for multimedia development, saving music and sound effect materials

    • MP3 (lossy)

      • With a good compression ratio, using LAME encoding, the sound is very close to the source MAV file

      • Features: The sound quality is 128Kbpit/s, the compression rate is relatively high, a large number of software and hardware are supported, and the compatibility is good

      • shortcoming:

        • The technology is relatively backward, and the sound quality will be worse than AAC and OGG at the same bit rate

        • only two channels

    • AAC (Adpative audio coding: lossy)

      • AAC is a new generation of audio compression technology. Through some additional coding technologies (such as PS, SBR, etc.), three main coding formats, LC-AAC, HE-AAC, and HE-AAC V2, have been derived. Compared with MP3, there are higher compression ratio

        • HE-XXX technology: HE stands for high efficiency, focusing on efficiency. HE-AAC is a mixture of AAC and SBR technology. SBR technology is Spectral Band Replication (frequency band replication). The key point is to provide full bandwidth encoding under low bit rate without generating For redundant signals, traditional audio coding will produce unpleasant noise signals. The solution to the problem of SBR is to let the core code encode low-frequency signals, but it will increase the bandwidth of the audio;

      • Features:

        • Improve the compression ratio: you can get better sound quality with smaller files

        • Enhanced multi-channel: can provide up to 48 full-range channels

        • Higher resolution: up to 96KHz sampling frequency

        • Higher decoding efficiency: less resources are occupied by decoding and playing

    • Ogg (lossy)

      • In low and medium bit rate scenarios, OGG is completely free except for good sound quality. OGG has a very good algorithm, which can achieve better sound quality with a smaller bit rate. 128Kbit/s Ogg is better than 192Kbit/s or even Higher bit rate MP3 is even better

      • Features: It can achieve higher sound quality with a smaller bit rate, and has good performance under high, medium and low bit rates

      • Disadvantages: poor compatibility, streaming media feature does not support

    • FLAC (lossless)

      • Has a higher compression algorithm, and 无损压缩will not destroy any original audio information during the compression process

2.2 Cloud Live Streaming Service

Cloud live broadcast service means that technical difficulties such as transcoding and content distribution are handled in the cloud, and only the corresponding interface is provided externally.

  • Live video service (ApsaraVideo Live)

    • It is an audio and video live broadcast platform based on the leading content access and distribution network and large-scale distributed real-time transcoding technology, providing convenient access, high-definition smooth, low-latency, high-concurrency audio and video live broadcast services

    • Ali cloud live broadcast process diagram

[Learning address]: FFmpeg/WebRTC/RTMP/NDK/Android audio and video streaming media advanced development

[Article Benefits]: Receive more audio and video learning packages, Dachang interview questions, technical videos and learning roadmaps for free. The materials include (C/C++, Linux, FFmpeg webRTC rtmp hls rtsp ffplay srs, etc.) Click 1079654574 to join the group to receive it~

3. Futu Securities live broadcast principle and practice

3.1 Live streaming process

  • live video

    • Overall process: acquisition → pre-processing → encoding → transmission → decoding → rendering

    • collection:

      • Generally, it is done by the client (IOS, Android, PC or other tools such as OBS). iOS is relatively simple, Android needs to do some model adaptation work, and PC needs to adapt to various models and cameras.

    • pre-processing

      • Common processing has

        • Audio processing: sound mixing, noise reduction and sound effects processing

        • Video processing: beauty, watermark, various filter effects and dynamic textures, etc.

      • Mainly for live broadcast beautification, the beautification algorithm needs to use GPU programming, there is no existing open source project, the difficulty lies in finding a balance between GPU usage and beautification effect, not how to achieve the beautification effect

    • coding

      • Using hard coding, soft coding 720P is completely useless, and it will occupy the CPU when using it, causing the device to heat up and other problems

      • Encoding should find the best balance point in the design of resolution, bit rate, frame rate, GOP (Group of Pictures) and other parameters

      • expand

        • GOP (Group of Pictures) is a group of continuous pictures. GOP is a set of pictures in the sequence. The first picture of GOP must be an I frame, which can ensure that GOP does not need to refer to other pictures and can be decoded independently.

        • MPEG encoding divides pictures (ie frames) into three types: I, P, and B. I is an intra-coded frame, P is a forward prediction frame, and B is a bidirectional interpolation frame. Simply put, the I frame is a key frame, which can be understood as a complete picture, while the P frame and B frame record changes relative to the I frame. Without I-frames, P-frames and B-frames cannot be decoded, which is why the MPEG format is difficult to edit precisely, and why we have to fine-tune the head and tail

    • transmission

      • Usually handed over to CDN service provider

    • decoding

      • It is to decode the operation of the previous encoding, which needs to be decoded in the Web is HLS

    • rendering

      • Player for rendering, currently using Tencent Cloud Player

    • Tencent cloud live broadcast solution diagram

4. Analysis of Douyu live broadcast principle (main H5 terminal)

4.1 Live broadcast technical solution

There is no direct .flvnetwork request in the Douyu live broadcast room, but a lot of .xsnetwork requests. The special thing is that .xsthe response to the network request Content-Typeis video/x-flvthat the principle still uses the HTTP-FLV scheme;

  • .xsAnalysis of the reason for the method used

    • depends on streaming

      • It is not completely using HTTP to pull streams, but to use CDN and P2P to pull streams at the same time

      • .xsNot a complete FLV stream, but a sub-FLV stream

      • Specific request logic

        • When requesting for the first time, a complete FLV stream will be requested

        • When the P2P connection is successful, switch to the sub-stream (disconnect the complete FLV stream request interface, and then pull the sub-stream)

        • The reason is: the P2P connection is slow, if the P2P method is used to request data for the first time, the starting speed of the video will be very slow

  • Analysis of specific steps

    • The first request to get the complete FLV stream

    • After establishing a P2P connection, disconnect the FLV stream request and switch to requesting a sub-stream

      • P2P streaming also has disadvantages, such as high live broadcast delay, which is not suitable for low-latency live broadcast scenarios, and consumes users' equipment and bandwidth, because in addition to pulling streams from other users, current users also need to upload their own videos data to other users

    • At the same time, create a WebSocket connection to push other users who are watching the current stream, so that the player can directly pull the stream from the user who pushes the stream

      • Douyu's P2P is based on WebRTC's DataChannel, which will create many WebRTC connections to receive video data shared by other users, and also share the currently downloaded video to other users

      • Of course, in the later optimization, we will try to merge multiple WebRTCs into one WebRTC, so that the relevant logic can be optimized

    • Whether it is HTTP or HTTP-P2P, its ultimate goal is to obtain FLV video data

On the PC side Response Headersas Content-Type:video/x-flvorContent-Type: video/x-p2p

5. Expansion

5.1 Analysis of media flow

5.1.1 Related concepts

  • streaming media development

    • The network layer (socket or stp) is responsible for the transmission

    • The protocol layer (RTMP or HLS) is responsible for sending network packets (slicing)

    • The encapsulation layer (flv, ts) is responsible for the encapsulation of codec data

    • The encoding layer (h.264 and AAC) is responsible for image and audio compression processing

  • GOP (Group of Pictures) picture group

    • A picture group is a group of continuous pictures, each picture is a frame, and a GOP is a collection of many frames

    • The data of the live broadcast is actually a set of pictures, including I frame, P frame and B frame. When users watch it for the first time, they will look for I frame

    • The player will go to the server to find the latest I frame and give it back to the user. As such, GOP Cache causes end-to-end delay because it must get the latest I frame

    • The longer the length of GOP Cache, the better the picture quality

  • code rate

    • The amount of data displayed per second after image compression

  • frame rate

    • The number of pictures displayed per second affects the fluency of the picture, and the two are directly proportional

  • Video Encapsulation Format

    • A container for storing video information. Streaming packaging can be TS, FLV, etc., and indexing packaging can include MP4, MOV, AVI, etc.

5.1.2 Related technical means

  • Format of audio and video collection data

    • The data collected by the audio is generally in the format of PCM

    • The data collected by video is generally in YUV or RGB format. The original data collected is very large and needs to be processed by compression technology to improve transmission efficiency.

  • video processing

    • video image processing

      • The video is finally rendered to the screen frame by frame through the GPU, so various processing can be performed on the video through OpenGL ES to achieve different effects

      • Now most of the beautification and special effects in videos are realized through the framework of GPUImage

      • OpenGL (Open Graphics Library) is a low-level graphics library that defines a cross-programming language and cross-platform programming interface

      • OpenGL ES (OpenGL for Embedded Systems) is a subset of OpenGL 3D graphics API, designed for embedded devices such as mobile phones, Pads and game consoles

    • Video Coding Framework

      • FFmpeg: It is a cross-platform open source video framework that can implement rich functions such as video encoding, decoding, transcoding, streaming, and playback. It supports a wide range of video formats and playback protocols.

      • X264: encode and compress the original video data YUV into H.264 format

      • videoToolbox: Apple's own video hard decoding and hard coding API, but it is only opened after iOS8

      • audioToolbox: Apple's own audio hard decoding and hard coding API

        • Hard decoding: use GPU to decode, reduce CPU operation

          • Advantages: smooth playback, low power consumption, fast decoding speed

          • Disadvantages: good compatibility

        • Soft decoding: use CPU to decode

          • Advantages: good compatibility

          • Disadvantages: increase CPU burden, good points == increased power consumption, no hardware decoding smooth, relatively slow decoding speed

    • Video Coding Technology

      • Main function: to compress the video pixel data into a video code stream, thereby reducing the amount of video data

      • Video compression coding standard: coding technology for compressing or decompressing video, such as MPEG, H.264, etc.

        • MPEG: It is a video compression method, which uses inter-frame compression, and only stores the difference between consecutive frames, so as to achieve a larger compression ratio. The advantage is that the definition of a single picture is higher

        • H.264/AVC: It is a video compression method, which is compressed by the same frame prediction method as the PB frame in MPEG. It can generate video streams suitable for network conditions according to needs, with higher compression ratio and better performance. Excellent image quality, the advantage lies in the clarity of animation continuity, but it has higher requirements for the system

    • Video Encapsulation Format

      • TS: A streaming media packaging format. The streaming media packaging can be played without loading the index, which greatly reduces the delay of the first loading. When the video is long, the index of the non-TS video will be quite large. affect user experience

        • Two TS clips can be seamlessly spliced, and the player can realize continuous playback

      • FLV: A streaming media packaging format, because the file it forms is extremely small and the loading speed is extremely fast, making it a more mainstream streaming video format

  • Live streaming related

    • Data Transfer Framework

      • librtmp: used to transmit data in RTMP protocol format

        • RTMP (Real-Time Messaging Protocol): A protocol developed by Adobe for Flash players, which is built on top of the TCP protocol or the HTTP protocol for round-robin training

    • Streaming process

      • Establish a TCP connection

      • Establish RTMP link and send various control commands

      • Get raw video and audio data

      • Compress and encode the original audio and video data (video encoding is h264, audio encoding is AAC)

      • Pack the encoded audio and video data

      • Send packaged audio and video data

  • Data pull related

    • Live protocol selection

      • RTMP or RTSP can be used for those with high immediacy requirements or interactive needs

      • For playback or cross-platform requirements, HLS is recommended

    • Live protocol comparison

      • HLS is implemented based on the HTTP protocol, and the transmission content includes two parts, one is the M3U8 description file, the other is the TS media file, which can realize live broadcast or on-demand streaming media, mainly for iOS

        • HLS is a technical method of on-demand to achieve live broadcast

        • HLS means 自适应码率that the client will automatically select video streams with different bit rates according to network conditions, and use high bit rates when conditions permit

        • HLS mainly has a relatively large delay. The advantage of RTMP is that the delay is low, and the small slices of the HLS protocol will generate a large number of files, and storing and processing these files may cause a lot of waste of resources.

  • Streaming server related

    • Data distribution method

      • CDN (Content Delivery Network), that is, content distribution network, a proxy server, is equivalent to an intermediary, distributes the content of the website to the "edge" of the network closest to the user, and improves the corresponding speed of user access to the website

      • How CDN works (such as requesting streaming data)

        • Upload streaming data to server

        • Server stores streaming data

        • The client plays the streaming media and requests the encoded streaming data from the CDN

        • The CDN server responds to the request, and when there is no requested streaming data on the current node, it continues to request to the source station

          • If the requested streaming data exists on the node, the CDN will send the streaming data to the client

          • If the requested streaming media data does not exist or the data has expired, the CDN node will request the upper-level source station to obtain the data, and the source station will respond to the CDN request and distribute the streaming media data to the corresponding CDN node

        • After the CDN gets the streaming media data, it sends it to the client

6. Recommended literature

Live Video | Basic Principles Principles of Live Broadcasting and Practice of Web Live Broadcasting

Original link: WebRTC → In-depth analysis of live broadcast solutions and principles in various industries - Nuggets

Guess you like

Origin blog.csdn.net/irainsa/article/details/130155611