38 day 22 monthly echo cancellation noise suppressing jitter buffer silence detection mix

1.

  Apply for WAN voice dialogue system if only rely on the above technology can achieve a good effect, it would be too easy a. It is because a lot of practical factors introduces a number of challenges for the above conceptual model so as to achieve network voice system is not so simple, it involves a lot of expertise. A "good results" the voice dialogue system should achieve the following: low-latency, low background noise, sound and smooth, no card, stop feeling, no response.

     For low-latency, low-latency only in the case, to make calls both sides have a strong sense of Realtime. Of course, this depends on the distance of the physical location of network speed and call on both sides, it is a simple software point of view, the possibility of optimization is very small.

(1) echo cancellation

      Now we almost have all become accustomed to during voice chat, loud speaker function directly with a PC or laptop sound. When using a loud speaker function, sound from the speaker is picked will play again, back to the other side, so that the other party to hear his own echo.

      Echo cancellation principle is simply that, an audio echo cancellation module frame basis just to play, to do something similar in operation to offset the acquisition of audio frames, which will echo removed from the collection frame. This process is quite complex, because of the size it was in when chatting with your room, and your position in the room about, because this information determines the length of the sound wave reflections. Smart echo cancellation module can dynamically adjust the internal parameters to best suit the current environment.

(2) the noise suppression

      Also known as noise suppression noise reduction process, based on the characteristics of voice data, the background noise belonging to the identified part, and filtered from the audio frame. There are many encoders have built this feature.

(3) jitter buffer

      Jitter buffer (JitterBuffer) to solve the problem of network jitter. The so-called network jitter, network latency that will be a large and small, in this case, even if the timing of the sender is sending data packets (such as every 100ms to send a packet), the receiving side receives the timing can not be the same, and sometimes within a period of a package are not receiving, within a period sometimes received several packages. So, leading recipient sound is the one card a card.   

      After JitterBuffer work in the decoder, before the speech broadcast link. That is, after the completion of speech decoding, the decoded frame into JitterBuffer, when the arrival of sound playback callback, remove the oldest one in play from JitterBuffer.     

      JitterBuffer buffer depth depends on the extent of the network jitter, network jitter, the greater the depth buffer, the audio playing delay greater. So, JitterBuffer is the use of high latency in exchange for smooth playback sound, as compared to a sound card is a card, slightly larger but delayed smoother effect, the subjective experience to be better.

      Of course, JitterBuffer buffer depth is not always constant, but changes according to the degree of network jitter and dynamic adjustment. When the network is restored to a very smooth smooth, buffer depth will be very small, such as the increase in JitterBuffer playout delay can be negligible.

(4) detection of silence

      In a voice conversation, if when one did not speak, it will not generate traffic just fine. Silence detection is used for this purpose. Usually silence detection module integrated in the coding. Silence detection algorithm connection with the previous noise suppression algorithms, may identify whether the current voice input, if there is no voice input, it may encode a particular output encoded frames (for example, a length of 0). Especially in multi-video conference, usually only one person to speak in this case, use silence detection technology to save bandwidth is still very substantial.

(5) Mixer

      In the video conference, when people speak at the same time, we need to play more than one person from the voice data, and buffer the sound card to play only one, it needs to be mixed into a multi-channel voice all the way, this is the mixing algorithm to do thing. 

 

https://www.cnblogs.com/justnow/p/4487201.html

Guess you like

Origin www.cnblogs.com/javastart/p/11910682.html