Talking about the main points of real-time video broadcast platform for instant messaging development

Now large and small companies, and even individual developers, want to develop their own live streaming websites or apps. This article will help you clarify what technical points you need to pay attention to when developing a live video streaming platform.

 

Do you think you can live broadcast by calling a few Chrome APIs?

What WebRTC uses is not a plug-in, but a built-in function of Chrome, an API of native js, and there are no plug-ins that come with the browser. After obtaining the image source, you should not use websocket to send image data, but directly use WebRTC's communication-related API to send image and sound (this set of API supports both image and sound) data.

The correct way is:

    1. You must have a client that implements WebRTC-related protocols. Such as Chrome browser.
    2. Set up a server similar to the MCU system.


The specific implementation steps are as follows:

    The first step is to use your client, such as Chrome browser, to obtain the image and sound source through the WebRTC-related media API, and then use the communication API in WebRTC to send the image and sound data to the MCU server.
    In the second step, the MCU server performs necessary processing on the image and sound data according to your needs, such as compression, audio mixing, etc.
    In the third step, users who need to watch the live broadcast connect to your MCU server through their Chrome browsers, and receive the image and sound streams forwarded by the server.
    The fourth step is to check the compatibility of the browser. The protocol used by IE is different from that of Chrome and cannot communicate with each other. The situation of firefox and opera is not very ideal.


Finally: If you follow this method to toss, what do you think will be the result? 1 person broadcasted, 39 people watched, running MCU on an i3 + 4G + Centos6.4 mini machine, running continuously for 48 hours without any problems, and the CPU usage rate was about 60%. Compared with the current live broadcasts that cost hundreds of thousands, isn't it weak? Instant messaging chat software app development can add Wei Keyun's v: weikeyun24 consultation

 

Therefore, don't be superstitious about WebRTC. WebRTC is only suitable for small-scale (within 8 people) audio and video conferences, not for live broadcasts.

Self-confident front-end developers will think: "Familiar with HTML5, one person can develop it in about 7 working days." Faced with such thoughts, there is only one sentence: young man, be humble.

In fact, you need to know:

    Camera acquisition;
    audio and video codec;
    streaming media protocol;
    audio and video stream push to streaming media server;
    streaming media network distribution;
    user player;
    audio and video     synchronization
    ;
;
    Language: C, C++, html, php, mysql...
    Development environment: embedded, Linux, Windows, Web...


Seeing this, do you still think this is a task that can be accomplished by one person?

If you are very talented, you can solve the above technical problems alone. Well, you also need to solve the transmission problem. Whether the transmission is good or not, that is, whether the video is delayed or stuck, depends on the network conditions. Needless to say, the complex network environment of the public mutual benefit network. Playing games at home does not freeze, but watching videos does. Chatting on QQ Fighting the Landlords is stress-free, but watching videos just gets stuck. What to do?

There are 3 solutions:

    Use CDN to accelerate;
    spend money to build your own server;
    or use other people's cloud services.


The video signal starts from the scene and reaches the audience scattered all over the country. It needs to be accelerated by the cache of the data center and nodes at all levels. The sum of the time required for the signal to go through all links along the way is the delay you see.

Accelerated with CDN, you can minimize latency. According to the current industry standard, the video delay is between 3-6 seconds. That is to say, when the video is live, what you see is the picture a few seconds ago.

If you set up your own server, if you don’t have enough data centers, you still have to use CDN to accelerate the transmission across networks and provinces. Then, in order to reduce the delay as much as possible, you need to deploy data centers in all provinces and cities across the country to solve cross-network and cross-province transmission. This solution, compared to CDN, is very expensive.

If you use cloud services, someone else sets up the server for you, and you just need to use it like a fool. In order to prevent suspicion of advertisements, there are many service providers that provide real-time live streaming cloud, please understand the specific situation by yourself.

Of course, no matter which method you use, please weigh the pros and cons comprehensively, and finding the solution that suits you is the best solution.

Next, let’s start with what links are needed in live video broadcasting, and how should we deal with them?

Live video can be divided into:

    Capture;
    Preprocessing;
    Encoding;
    Transmission;
    Decoding;
    Rendering.


The above steps are described in detail below:

- Acquisition: iOS is relatively simple, and Android needs to do some model adaptation work (Agora.io is currently compatible with 4000+ Android models). PC is the most troublesome with all kinds of weird camera drivers, and it is especially difficult to deal with problems. It is recommended to abandon PC and only support mobile anchors. At present, several new live broadcast platforms are like this.

- Pre-processing: Live beautification is now standard, and 80% of the anchors cannot watch without beautification. Beautification algorithms require people who understand image processing algorithms. There is no good open source implementation, so you have to refer to papers for research. After the algorithm is designed, it needs to be optimized. Whether you plan to use CPU or GPU optimization, the algorithm optimization itself also requires professional knowledge support. Although the GPU has good performance, it also consumes power. Too much GPU usage will cause the phone to overheat, and the overheating of the phone will cause the camera to drop frames, especially on the iPhone 6. Because the CPU of the iPhone 6 is very close to the front camera, a lot of development and debugging work is required in algorithm development, algorithm optimization, and effect balance. And all of this requires experience support.

- Coding: If you want to go to 720p, you must use hard coding. Soft encoding 720p is completely hopeless, hardware encoding is inflexible. Compatibility is also problematic. How to adapt to the complicated network and complicated uplink and downlink equipment? The pitfalls of Android and chips are known to anyone who has developed them. Then someone asked, if the requirements are not high, is it okay to use soft-coding and low-resolution 360p? Even with low resolution, soft coding will still cause the CPU to heat up. If the CPU overheats and burns the camera, the long-term heat will not only directly reflect the power consumption. Since it is a live broadcast on a mobile phone, it is really unreasonable to plug in the power supply and charger. Also, if the CPU heats up, the frequency will drop, what should I do? This is just about performance. Unlike pre-processing, which only affects image quality and power consumption, video codec technology is also associated with cost calculation and network confrontation. After considering performance, power consumption, cost, and network, the bit rate, frame rate, and resolution of your encoding. How to choose software and hardware development?

- Transmission: It is unrealistic to do it yourself, let it be a third-party service provider.

- Decoding: If you use hard decoding, you must do fault-tolerant processing and adaptation. A sudden crash caused the phone to restart, right? Android's hard decoding, let's not talk about it. If you add a network, the hard decoding of the current mobile phone does not necessarily support the use of soft decoding, and the problem of power consumption and heat comes up again.

- Rendering: Why does the mobile phone clearly decode a lot of frame data. It just doesn't render. Why is the screen out of sync.

Okay, thought it was over?

And audio. What should I do if the mic resources are preempted? Why is there always a problem with the recording thread? Audio pre-processing is more complex. When to turn on the triple-A engine noise suppression? Echo cancellation? gain control? Why is AAC better quality than Opus? What is aac, he-aac, heaacv2? how to choose? Do you want to add reverb? How to choose playback and recording mode? If you want interactive echo cancellation, you need to adapt to multiple models.

The above is the media module, as well as signaling control, login, authentication, authority management, status management, etc., various application services, message push, chat, gift system, payment system, operation support system, statistics system, etc.

There are also databases, caches, distributed file storage, message queues, operation and maintenance systems, etc. in the background.

Guess you like

Origin blog.csdn.net/weikeyuncn/article/details/128237137