Share instant messaging audio and video development coding standards

Real-time audio and video technology in instant messaging applications is almost the last high wall in the development of IM. The reason is: real-time audio and video technology = audio and video processing technology + a collection of horizontal technology applications of network transmission technology, and the public Internet is not designed for real-time communication.

 

The video communication process is a real-time two-way complete communication process of video and audio. In this process, in order to obtain high-definition video images, we sometimes ignore another important process-the audio communication process. If we are watching high-definition video images, we cannot get a clearer and continuous audio effect. Then this process is actually meaningless, so its importance is even more than that of video.

The development of audio technology in traditional video conferencing systems and instant messaging chat systems is extremely slow. The sampling frequency, sampling accuracy and sampling range indicators of the audio signal are greatly reduced, so that the audio clarity and restoration that can be provided are greatly attenuated. Audio has very low fidelity compared to standards used for storage and playback of non-real-time compression protocols such as OGG, MP3, etc. In this way, the restoration of live sound cannot meet the requirements to some extent.

At present, audio standards such as G.711, G.722, G.721, and G.728 are mainly used in the traditional video communication process. The sound ability can reach 20Hz-20KHz. Therefore, the loss of too much audio information in the process of restoring the scene environmental sound makes it impossible to truly express the scene situation. Therefore, in the process of high-definition video communication, we must have an auxiliary audio processing method to solve this problem. Make the entire high-definition communication process closer to perfection.

At present, there are many standards for audio processing technology in the world. For the next generation of real-time interactive audio processing, MPEG-1 Layer 2 or AAC series audio can be used. The principle for selecting the standard is that the audio frequency response range should reach 22KHz, so that It can cover almost the entire range of human hearing, and even surpass it in high frequency. It can restore the live audio in a true and natural way, and can use two-channel stereo playback when restoring, so that the sound of the entire video communication Stronger sense of proximity, reaching CD-level sound quality. At the same time, it achieves the best adaptation to link bandwidth and codec efficiency. Instant messaging chat software app development can add Wei Keyun's v: weikeyun24 consultation

 

The following are descriptions of the various audio encoding standards.

Real-time audio communication coding standard: G.711

Type: Audio
Formulator: ITU-T
Required bandwidth: 64Kbps
Features: Small algorithm complexity, general sound quality
Advantages: Low algorithm complexity, small compression ratio (CD sound quality > 400kbps), shortest codec delay (compared to other technologies )
Disadvantages: higher bandwidth occupied
Remarks: G.711 64kb/s pulse code modulation PCM announced by CCITT in the 1970s.

Real-time audio communication coding standard: G.721

Formulator: ITU-T
Required bandwidth: 32Kbps
Audio bandwidth: 3.4KHZ
Features: Compared with PCMA and PCMU, its compression ratio is higher, and it can provide a compression ratio of 2:1.
Advantages: Large compression ratio
Disadvantages: Average sound quality
Remarks: Subband ADPCM (SB-ADPCM) technology. The G.721 standard is a code conversion system. It uses ADPCM conversion technology to realize mutual conversion between 64 kb/s A-law or μ-law PCM rate and 32 kb/s rate.

Real-time audio communication coding standard: G.722

Formulator: ITU-T
Required bandwidth: 64Kbps
Audio width: 7KHZ
Features: G722 can provide high-fidelity voice quality
Advantages: Good sound quality
Disadvantages: High bandwidth requirements
Remarks: Sub-band ADPCM (SB-ADPCM) technology

Real-time audio communication coding standard: G.722.1

Formulator: ITU-T
Required bandwidth: 32Kbps/24Kbps
Audio width: 7KHZ
Features: It can achieve lower bit rate and greater compression than G.722 codec. The goal is to achieve roughly equivalent quality to G.722 at about half the bitrate.
Advantages: Good sound quality
Disadvantages: High bandwidth requirements
Remarks: Most of them are currently used in video conferencing systems.

Real-time audio communication coding standard: G.721 Appendix C

Formulator: ITU-T
Required bandwidth: 48Kbps/32Kbps/4Kbps
Audio width: 14KHZ
Features: Adopt Siren™14 patented algorithm from Polycom, which has a breakthrough advantage compared with the earlier broadband audio technology, providing low time extended 14 kHz ultra-wideband audio at less than half the bit rate of the MPEG4 AAC-LD alternative codec, while requiring only one-tenth to one-twentieth the computational power, leaving more processor cycles to improve video quality or run Internet applications, and battery life on mobile devices can also be extended.
Advantages: clearer sound quality, almost comparable to CD sound quality, can reduce listener fatigue in applications such as video conferencing.
Disadvantages: It is Polycom's patented technology.
Remarks: Most of them are currently used in video conferencing systems

Real-time audio communication coding standard: G.723 (low bit rate voice coding algorithm)

Formulator: ITU-T
Required bandwidth: 5.3Kbps/6.3Kbps
Audio width: 3.4KHZ
Features: The voice quality is close to good, the bandwidth requirement is low, efficient implementation, easy for multi-channel expansion, and 53coder can be realized by using the 16kRAM on-chip of C5402. Reach the voice quality required by ITU-TG723, and the performance is stable. It can be used for IP phone voice source coding or high-efficiency voice compression storage.
Advantages: low bit rate, small bandwidth requirements. And achieve the voice quality required by ITU-TG723, the performance is stable.
Disadvantage: Average sound quality
Remarks: G.723 speech encoder is a dual-bit-rate encoding scheme for multimedia communication with encoding rates of 5.3kbits/s and 6.3kbit/s. The G.723 standard is an integral part of the multimedia communication standard formulated by the International Telecommunication Union (ITU), and can be applied to systems such as IP phones. Among them, the 5.3kbits/s code rate encoder adopts multi-pulse maximum likelihood quantization technology (MP-MLQ), and the 6.3kbits/s code rate encoder adopts algebraic code excitation linear prediction technology.

Guess you like

Origin blog.csdn.net/weikeyuncn/article/details/128371765