Several suggestions for mobile terminal development of instant messaging audio and video development

With the rapid development of the mobile Internet and the gradual improvement of the performance of smart terminals, real-time audio and video communication between smart terminals has become an important direction for the development of the mobile Internet. In fact, real-time audio and video communication = audio and video processing + network transmission. Including acquisition, encoding, network transmission, decoding, playback and other links. So many technical applications that are not simple, if not grasped properly, will encounter pitfalls one after another in the actual development process. This article will give brief reference suggestions on several typical problems.

The mainstream is the H.264 encoder, and the open source codec implementations are mainly x264 and openh264. Among them, openh264 is an open source project of Cisco, which is optimized for real-time video call scenarios. In addition, Google's vp8 has also done a lot of optimization for the mobile terminal (VP8 (the latest version is already VP9) is the default encoding format of Chrome).

Talk about a digression about the patent authorization of H.264:

    H.264, also known as MPEG 4 Part 10, requires the same authorization fee as MPEG 2, and is uniformly managed and collected by the patent alliance MPEGLA (a standard such as H.264 is actually a collection of a large number of patented technologies, and of course it is impossible to fully It is held by a certain company, so for the sake of win-win cooperation, everyone took out the patent to form a so-called alliance, and it is not good for everyone to fight). Using the MPEG 4 standard requires payment of a license fee for encoding/decoding. After 100,000 products a year, it is free, but when it exceeds 100,000, each product will charge a license fee of 0.2 US dollars. When it exceeds 5 million, The authorization fee is reduced to 0.1 US dollars, and the upper limit is 5 million US dollars.
    But online free content, such as video sites such as YouTube, can be used for free until 2016. However, if rental movies are provided, like NetFix, the authorization fee will be charged according to the number of users; if it is used for PPV (Pay Per View) and VOD (Video on Deman), such as paid movies on MOD and BBTV digital cable TV, more than 12 Minutes of content, but also charge a 2% authorization fee of the selling price. Up to 5 million as the upper limit.


Looking at the licensing fees above, everyone is a technology company, and of course some people will be unhappy, so many open source solutions have been born (of course it is not so easy to do well), among which Google's resistance is the most intense. The dispute over audio and video standards between Google and Microsoft + Apple is a very hot topic. If you are interested, don’t visit Baidu to learn more.

If software coding is used, it will consume more cpu resources, which means that the device gets hot and consumes power quickly, but the device has good compatibility and can run on almost any device. Instant messaging and chat software app development can be added to Wei Keyun

If the hardware encoding method is used, the encoding performance is good, it can fully support the real-time encoding of 1080p full HD images, and it also saves power, but the adaptability of the device is relatively poor, especially the hardware encoding mode of the Android device is relatively poor. The adaptability supported by ios devices is relatively good, but without opening a lower-level encoding interface, it is difficult to obtain the code stream by frame for real-time live broadcast. In addition, it is more difficult to do dynamic bit rate control with hardware encoding.

For web live broadcast and on-demand scenarios, try to smooth the bit rate fluctuations during the encoding stage, which requires optimizing the bit rate control algorithm.

If the interval between key frames is small, the bit rate will have frequent spikes, which will cause instant congestion when sending data.

You can solve the bit rate fluctuation problem by setting the buffer, such as adding a sending buffer on the streaming end, and sending data at a fixed bit rate instead of sending data per frame.

Similarly, a receiving buffer can also be set on the player to solve frequent freezes caused by network fluctuations. However, setting the buffer too large will increase the delay, which is not suitable for live broadcast applications, but more suitable for on-demand applications. For live broadcast scenarios, the end-to-end delay is required to be as small as possible, so that the player can start quickly and see the picture.

For rtmp live broadcast, it is necessary to solve the cumulative delay. You can use the method of actively clearing the buffer on the player.

Whether it is a live broadcast or an on-demand service, there is an end-to-end data transmission link problem. At the streaming end, it must first connect to the streaming server. At this time, it is necessary to select a suitable node. One is to select according to the DNS domain name of the client. For the nearest nodes, when the DNS configuration is wrong, there may be inaccurate scheduling problems.

The other is to select nodes based on the egress IP of the client. This scheduling method will be more accurate. Also for the player side, a similar method is used to select the edge nodes of the streaming media server cluster.

During the whole live or on-demand process, it is better to have real-time statistical data, including network type, machine information, real-time network status, frame rate, bit rate, resolution rate, etc. In this way, various problems encountered can be analyzed, especially for live broadcast scenarios, when the network fluctuates and freezes occur, it can provide a basis for dynamically adjusting QOS.

For real-time audio and video live broadcast scenarios, the qos strategy is adopted to dynamically adjust the encoding parameters. Including frame rate, bit rate, resolution, buffer. When there is a freeze in the live broadcast, the strategy of fast decline and slow rise is adopted. When the network fluctuation is severe, this can avoid frequent back and forth adjustments of encoding parameters, causing a vicious circle.

When adjusting the encoding parameters, the code rate and frame rate are generally divided into several grades according to the resolution, and then switch back and forth between these groups of participants according to the statistical data within a certain period of time to ensure smooth audio and video At the same time, try to improve the image quality.

Guess you like

Origin blog.csdn.net/weikeyuncn/article/details/128286612