WebRTC video data flow analysis

Hello everyone, I am very happy to have the opportunity to share with you. My name is Xu Jianlin, and my English name is Piasy. I am currently engaged in the research and development of high-stability, low-latency audio and video real-time transmission technology in PowerInfo. So far, I have five years of work experience. In the first two years, I mainly did Android APP development for live video. After that, I have been doing RTC SDK development in PowerInfo for the next three years. I am an open source enthusiast, and I have published many open source projects on GitHub. Everyone is very welcome to pay attention and communicate with each other.

The topic of this sharing is the analysis of WebRTC video stream data flow. The main content can be divided into the following parts:

  • Introduction to the WebRTC Codebase
  • Analytical method
  • Video Process Introduction
  • Actual Combat: Client Video Recording

Introduction to the WebRTC Codebase

1.1 Brief introduction of WebRTC

About what is WebRTC, how to briefly explain it in one or two sentences?

WebRTC is an Internet standard for web-side RTC. At the same time, we will also use WebRTC to refer to an open source project. It is currently the most complete and popular RTC framework, and it is an open source project by Google.

1.2 WebRTC Release Notes

The figure above shows the version release schedule published on the Chromium project website. In the figure, Milestone 81 is the number of Milestone, abbreviated as M81. Other columns in the table, such as Chromium, Skia, and WebRTC, are their corresponding version branches. For example, Skia has the m81 branch with the same name. Before, WebRTC also had the m75 and m76 branches with the same name, but the naming method of the branch was changed later.

In "WebRTC Native Development Practice", it is mentioned that the code analysis of WebRTC in the book is based on a certain submission version, such as "#30432 submission". As shown in the figure is the Git submission record diagram of WebRTC, the position pointed by the blue arrow: Cr-Commit-Position: refs/head/master@{#30432}, the meaning of "#30432" refers to the project from the beginning to the present The first few submissions. The code base of WebRTC has a characteristic, its main branch is a straight line, and there are no other branches (of course, when a new version is released, a release branch will be opened, and some submissions that need to be brought up may be synchronized, but the submissions on this branch are not will be merged back), which makes the version history of WebRTC very clear, and it will be very simple and convenient for developers to query the submission record or change history.

The picture above was taken in a project I participated in before, and it was taken in Sourcetree. We can see that its branches are very complicated, but in fact this is the result of following the version control mode of GitFlow. Gitflow is criticized by many people because the branch structure of the submitted version records is very complicated and it is difficult to trace the history.

1.3 WebRTC code directory

The picture shows the code library directory of WebRTC iOS in MacOS

The first is "api", which is mainly the public API of C++ code. Developers will use its predefined interface program when developing in C++, such as pPear_cConnection class. At the same time, if you use the Java or ObjectiveC interface on Android or iOS, it is actually bouninding the C++ interface.

"call, pc, media": In my understanding, these three directories are the implementation codes of the main process and business logic of WebRTC.

"audio, common_aduio, video, common_video": These four directories are mainly audio and video related codes.

"modules": Many companies may not directly use the entire WebRTC code base, but only use some of the commonly used modules, most of which are included in "modules", such as echo suppression, noise suppression and other processing, video encoding, Jitterbuffer, etc. .

"p2p": codes related to p2p connections.

"sdk": codes related to Android and iOS platforms, such as video capture, preview, rendering, codec and other codes that need to call system interfaces, bouninding of C++ interfaces.

"rtc_base": Some common basic codes in the Chromium project, such as thread and lock related codes.

"third_party": Contains many other Google open source projects and non-Google open source projects. Those used by WebRTC are placed in third_party, such as FFmpeg, libvpx, etc.

"system_wrappers": A directory containing another system-related code, such as the sleep function. The SDK mainly involves codes related to Androic and iOS platforms, while system_wrappers includes codes related to more platforms such as windows.

"stats, logging": status statistics, log printing related codes.

"examples": Contains demos of various platforms, such as Android, iOS, Windows, Linux, MacOS, etc.

As far as my study and understanding are concerned, some other directories have not been touched, but they should not be related to the main process.

In the first part of this sharing, we will end with the title Hello WebRTC of the first part of the book "WebRTC Native Development Practice".

Chapter 1: Setting up the development environment: There are very detailed step-by-step tutorials in the book. As long as you solve the problem of scientific Internet access, you should basically not encounter other problems according to the tutorial.

Chapter 2: Running the official demo: mainly the various demos in the examples directory just mentioned.

Chapter 3: Basic process analysis: The basic process here is somewhat different from what we shared this time. The basic process here is more about how to use the WebRTC interface to realize a simple 1V1 audio and video call, which is realized by Demo a function.

Analytical methods - how to get started with large-scale projects

For individuals, how to get started with large-scale projects as soon as possible? For example WebRTC or other open source projects like FFmpeg, GStreamer, etc. Including everyone joining a new company, it is likely to take over or participate in larger projects. Although they may not be as huge as WebRTC, there are still certain challenges. Here to share some of my experience, I hope to provide some help.

First of all, the first step is to "run it". Only by running the demo of the relevant project can we have a more intuitive understanding of the project, understand its related functions, and use the position where the function is realized as the starting point to think about its implementation. method.

The second step is to "start with the external API and follow the clues". For example, the picture below is the code of iOS. First, find the external API, such as RTCCameraVideoCapture in the code, which is used to realize camera capture, and then you can see how the class calls the interface and processes data.

The third step is "Based on basic knowledge (audio collection and broadcasting system interface), search and locate key functions/classes", the second step, for example, under Android or iOS, we first find the external interface that needs to be called to realize the corresponding function, and we can use these key The interface is searched in the code to discover key functions and classes. But not all logic will have external web interfaces. For example, audio-related implementations in WebRTC do not need to call any interfaces. The figure below is an example of iOS. The most critical function for audio playback is AudioOutputUnitStart, which starts an Audio Unit. After searching, we can find the voice_processing_audio_unit.m file, which contains a Start function, and we can further observe the function and the interfaces of the header file, such as initializing start, stop, etc., and the audio can be expanded or read the source code from here.

The fourth step is "Static reading source code + single-step debugging". Static reading of the source code mainly uses the code jump of the IDE, but the way gn generates the exXcode project file is somewhat special, and many code jumps will fail or jump to the wrong place. So more often I still use the global search code, although the efficiency is slightly lower, but there is no other more suitable way at present. Single-step debugging, in some places in the code, we want to know how to jump to the next step, but the code cannot jump directly, and the search results do not know what the function is and cannot be judged accurately. At this time, we can pass Add breakpoints to verify.

As shown in the figure, it is a function of a class related to video encoding. After adding a breakpoint, we can observe how the video data goes from the callback interface of the system to the class of collecting RTCCameraVideoCapture and then to the class of encoding step by step, which is very clear.

In software development, there is no silver bullet. They are all seemingly simple but often very effective methods. After mastering these methods, it will be helpful to start some new projects.

Video Process Introduction

The video data flow of WebRTC is basically the same on each platform.

The video data is first collected by VideoCapturer, then handed over to VideoSource, and then transmitted to receiving objects, such as Encoder and Preview, through VideoBroadcaster. Preview is responsible for local preview, and Encoder is responsible for encoding and sending. After receiving data from the network layer, it will first be decoded by VideoDecoder, and then it will also be transmitted to VideoBroadcaster, and then distributed to the receiver of the data. VideoSource is not shown in the above figure, but it is also a key class. It is located between VideoTrack and VideoBroadcaster, and it is actually an encapsulation of the VideoBroadcaster interface.

VideoTrack is a relatively important concept in WebRTC. Media such as audio and video are actually a Track conceptually. We usually add or receive a Track from the remote end. In addition, the process of IOS is somewhat different from the process in the above figure. Its video preview does not receive the data of each frame from VideoBroadcaster and then renders it, but its system has an interface that can associate the two system classes of acquisition and preview and automatically achieve rendering. . But in fact, like the RemoteRenderer class, we can also obtain a frame of data and then render it, and use the RemoteRenderer class to add it to the VideoBroadcaster at the acquisition end for rendering.

On non-iOS platforms, local preview and remote video rendering are actually implemented through a class.

Complete video data flow (call stack)

The figure lists the steps related to the overall acquisition, processing and transmission of video data in detail. In simple terms, it is the overall process from top to bottom to the bottom network layer, from bottom to top and finally to rendering. The video data flow of all platforms is basically the same, the only difference lies in the implementation of acquisition, codec and rendering, and the rest of the flow is basically the same.

collection:

First, RTCCameraVideoCapture will call back from the system data, receive the actual video data, hand it over to VideoSource to pass the data to the C++ layer through _nativeVideoSource, and finally submit AdaptedVideoTrackSource for some operations such as rotation and cropping.

coding:

After the video data passes through the AdaptedVideoTrackSource layer, it can be distributed through broadcaster_. There may be multiple branches in Android or Linux, one for preview and one code, here we use the code as the main branch for analysis. Sink is actually the consumer of the data. Encoding is realized through VideoStreamEncoder, but it is only a conceptual encoding. In the end, the actual encoding still calls system-related classes, so it will eventually return to the ObjectiveC layer, reach RTCVideoEncoderH.264 through some calls, and then Call the VideotoolbBox interface to implement H.264 hardware encoding. After the encoding is completed, the system will implement a callback, and then return the encoded data to the C++ layer, that is, in the OnEncodedImage callback function of VideoStreamEncoder, indicating that a frame of video data has been encoded.

send:

VideoSendStream indicates the video stream to be sent, RTP package processing is performed through rtp_vVideo_sSender_, and the next step is the RTP package and network transmission that need to be carried out. Assuming that the data transmitted through the network has reached RtpVideoStreamReceiver, we can see that the sender and receiver on the left and right sides will have some symmetry in the naming of classes and functions. RtpVideoStreamReceiver receives RTP, and has completed unpacking and other network out-of-order, error retransmission, etc., and obtains a complete frame that can be decoded, and then calls the decoding callback, and sends it to VideoReceiveStream for decoding operations. Here we will Call the Decode function of vVideo_rReceiver_.

decoding:

The Decode function of vVideo_rReceiver_ is actually a conceptual decoding, called VCMGenericDecoder, and will eventually be transferred to the platform-related ObjectiveC implementation of the hard video solution, that is, RTCVideoDecoderH.264, which also calls VideoToolBox for decoding, and returns it to C++ through DecodedFrameCallback after decoding. layer.

The decoded data finally reaches VideoStreamdDecoder and is handed over to incoming_vVideo_sStream_. There will be a queue of video frames here. Decoding and encoding are not the same. Encoding is to collect a video frame and send it immediately after encoding is completed. However, rendering will not be performed immediately after decoding, but requires a certain amount of buffering. Avoid stuttering caused by jitter. Therefore, after the video data is decoded, it will be put into the queue first, waiting for the rendering module to control the rhythm, and then get the data when needed.

rendering:

After the video data is obtained, it will pass the data to the sink through Broadcaster. The sink renders the data through RTCMTLVideoView on iOS, and MTL calls the Metal interface of iOS to render the video.

In fact, the picture is just a summary of the call stack in the video process. There is a chapter in the book that summarizes the analysis and explanation of more sample codes for the video data process.

Actual Combat: Client Video Recording

First of all, the requirements must be clarified: 1. Both push and receive streams are required, that is, the sent data needs to be recorded into files and the received content must also be recorded into files; 2. Secondly, it is not desirable to do additional encoding, because usually receiving or sending All videos have already been processed (encoded), and additional encoding will cause waste of resources. 3. In the case of no additional encoding, we only need to call FFmpeg to store the encoded data in the file. 4. Where should we get the data from?

To answer the question of where to get the data, you first need to have a certain understanding of the video data process, which is the content introduced in the third part above. As shown in the red box above, VideoSendStreammplmpl::OnEncodedimage has received encoded video data, but its data exists in the form of a complete frame, and has not been split into RTP packets one by one. The situation at the receiving end is more complicated. During network transmission, there will be problems such as out-of-order arrival, packet loss and missing, resulting in unavailability of network data. Therefore, we need to find a data point that has been processed for the above problem, that is, the data point before decoding, in the HandleEncodedFrame function of VidioReceiveStream.

When we find the data access point, we need to modify the code, add API, and realize related functions. For example, on Android and iOS, it is hoped that a Java or Object C interface is exposed for the APP layer to call. If you want to modify the iOS interface, you need to modify the code in the SDK directory.

For example, as shown in the figure, we need to modify the RTCPeerConnection file, which is defined as the main class of WebRTC. Add the Start/StopRecoder interface, and use the dir parameter to indicate the direction you want to record the video (send or receive).

The SDK is only the boinunding of the C++ interface, so it is necessary to modify the C++ interface in the API directory, that is, modify peer_connection_interface.h to add an interface for the C++ PC class.

The API is only the program interface. We need to modify the implementation class of the program. The implementation class is mainly in the PC, but there is a special thing here. The business process and implementation logic are also a very important part of the call.

As shown in the figure, it is a subclass of api/peer_connection_interface, a specific integrated implementation class, we add an interface for it, but here we do not call the recording-related code in the class of peer_connection_interface, but in the call Revise.

The VideoSendStream and VideoReceiveStream we introduced earlier, as well as the VAideo-related classes not introduced this time, are all managed in the call object. In my understanding, in the previous conceptual model of WebRTC, the main class is actually a Call, and pPeercConnection is an interface defined in the subsequent standardization process. Therefore, the actual video recording call function is encapsulated into a Recorder class, and the management of the Recorder class will be placed in the call, modifying the pc peer connection and the header file of the call.

After completing the above operations, the next step is to intercept the data. The operation of intercepting data is actually the function call of VideoSendStream and VideoReciveStream. The Recorder object is in the call, and the objects of the two Streams are also in the call. Then we only need to set the Recorder to the Stream and inject it.

From call to VideoSendStream, the files shown in the above figure need to be modified. There is a definition of Stream interface in Call, and then there will be subclasses and implementation classes of Stream interface defined in call in the video directory, video_send_stream_impl, in OndecodedImage, set Give a complete frame to the recorder, and then call the header file interface of FFmpeg.

The data receiving end is similar to the sending end. ReceiveStream and SendStream are very symmetrical in function. There is also an interface definition under the call directory, and an interface implementation under the video directory.

There is a complete submission on github for the complete code related to the recording, which you can use as a reference.

Analysis of the original WebRTC video data flow-Knowledge

 

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!

 

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/132155264