Special Lecture speech, the speech signal of the microphone array processing techniques

https://blog.csdn.net/ffmpeg4976/article/details/52397000

Reprinted from the horizon robot Auditorium, Speaker PEOPLE 'S LIBERATION.

November 2011 graduated from the University of Edinburgh to communicate with the Signal Processing, a former Nokia, Lenovo, Microsoft senior audio engineer. Horizon-Robotics now responsible for speech-related hardware system design, to a far-field microphone array design of high-sensitivity, high-precision evaluation audio codec hardware verification, the far-field preprocessing voice verification algorithm evaluation, relates to sound source localization, beam forming, blind signal separation algorithm related art, smooth noise reduction and echo suppression.

Foreword

With artificial intelligence and getting closer to people's lives, the development of speech technology is also of concern. The traditional near-field voice has been unable to meet people's needs, people want to be at greater distances, more complex environments voice control of smart devices. Therefore, an array of far-field voice technology as the core technology.

Array microphone significance of artificial intelligence:

  1. Spatial selectivity: spatial positioning technology can be obtained by electronically scanned array sound source such as a valid location, smart devices in obtaining accurate sound source location to let our voice be more intelligent, obtain high-quality voice signal quality through algorithms.
  2. Microphone arrays can automatically detect the location of the sound source, tracking the speaker, and can obtain multiple sound sources and tracking advantages of moving sound source, whether you go anywhere, smart devices will be on your position and direction of voice enhancement.
  3. Array microphone increase the airspace processing, multi-frequency three-dimensional space-time signal processing in a single signal to compensate for noise suppression, echo suppression, reverb suppression, sound localization, lack of separation of the voice, so that our smart devices in complex environments You can get high-quality voice signal, to provide better intelligence voice experience.

Technical difficulties microphone array technology:

Traditional array signal processing techniques applied directly to the microphone array processing system are often not ideal, because the microphone array process which have different handling characteristics:

  1. Build an array model
    microphone main application processed speech signal, the pickup range is limited, and the model used for near-field, such that the conventional array processing methods such as radar, sonar and other far-field plane wave model is no longer suitable, the model in the near field, the need for more precise spherical wave, needs to be considered due to the difference of the amplitude attenuation of different propagation paths.

  2. Wideband signal processing
    normal array signal processing multiple narrow band, i.e., different time delay and receiving array elements mainly in the carrier frequency phase, and the speech signal is not modulated and no carrier, and a high frequency of large, different array elements characteristic phase delay relationship between the sound source itself is large - closely related frequency, such that the conventional array signal processing method no longer fully apply.

  3. Non-stationary signal processing
    conventional array process, a plurality of stationary signals, and the signal of the microphone array processing multi nonstationary signals, or a stationary signal short-term, so the microphone array is generally made of short signal frequency domain processing, each frequency domain corresponds a phase difference, the wideband signal into a plurality of sub-bands in the frequency domain, each sub-band narrowband processing to do, and then combined into a broad-spectrum.

  4. Reverberation
    sound propagation influenced by space, because the space reflection, diffraction, the signal received at the microphone in addition to the direct signal, as well as the multipath signal is superimposed, so that the signal is disturbed, i.e. reverberation. In the indoor environment, by the room boundaries or obstacles diffraction and reflection leading to the continuation of sound, speech intelligibility affect a great extent.

Sound source localization

Sound source localization technique is widely applied in the fields of artificial intelligence, to form a space Cartesian coordinate system using a microphone array, depending on the linear array, a planar array and spatial array, to determine the position of the sound source in space. First of all smart devices can do to further the position of the sound source speech enhancement, when the smart device to access your location information can be combined with other sensors for further intelligence experience, such as the robot will come to hear you call your side, video equipment focus will lock the speaker and so on. Before looking at the sound source localization technology, we need to understand the near-field and far-field model model.

Near-field and far-field model model

Write pictures described here

Usually from the microphone array 1 ~ 3m, for an array in the near-field model, the microphone array to accept a spherical wave not a plane wave, sound attenuation may occur during the propagation, whereas the attenuation proportional to the distance factor and spread, so sound waves from the amplitude of the sound source to the array element is also different time. And far-field model, the distance of the sound source to the array elements the difference is relatively small and can be ignored. Typically, we define 2L² / λ is the near and far-field threshold value, L is the array aperture, [lambda] is the acoustic wavelength, and therefore not only the array element receives a signal amplitude as well as phase delay attenuation.

Sound source localization techniques

A method for sound source localization include beamforming, the TDOA and super-resolution spectral estimation, respectively, the relationship between the sound source and the array into a beam space, the space and the time difference spectrum, and through the corresponding positioning information arrives.

Electrically scanned array

Formed by the beam scanning space in the array, to determine the inhibition of different directions from different angles. The array is controlled by controlling the weighting coefficients of each array element output point to be scanned. When the system scans the beam direction corresponding to the maximum output signal power is considered to be the direction DOA of a sound source, the sound source can be positioned. Electronically scanned array existence of certain limitations, apply only to a single sound source. When multiple sound sources in the same array pattern of the main beam, it can not be distinguished. Such positioning accuracy and the width of the array related to - at a specified frequency, and the array beamwidth is inversely proportional to pore size, large pore diameter so that the microphone array is difficult to achieve in many occasions hardware.

Super-resolution spectral estimation

Such as MUSIC, ESPRIT, etc., its covariance matrix (correlation matrix) decomposed feature, structure spatial spectrum, on the direction of the spectral spectrum corresponding to the peak is the sound source direction. Suitable plurality of sound sources regardless of the case, and the sound source resolution and array size, break the physical limitation, become super-resolution spectrum scheme. Such methods can be extended to broadband processing, but sensitive to errors, such as error microphones monomers, channel error, for far-field model, the huge amount of matrix operations.

TDOA

TDOA is successively estimated sound source to the differential delay between the microphones, the difference in distance calculated by the delay, then determining a location of a sound source using the spatial distance difference and the geometrical position of the microphone array. TDOA estimates TDOA location and is divided into two steps:
1. TDOA estimates
are commonly used in generalized cross correlation GCC, Generalized Cross Correlation and LMS adaptive filtering

Generalized cross correlation

Generalized cross correlation

Sound source localization based on TDOA, time delay estimation is performed mainly GCC. GCC calculation method is simple, small delay, good tracking ability for real-time applications, better performance at low and moderate intensity noisy reverberation noise, the positioning accuracy decreases in noisy non-stationary noise environment.

LMS Adaptive Filter

Write pictures described here

在收敛的状态下给出TDOA的估值,不需要噪声和信号的先验信息,但是对混响较为敏感。该方法将两个麦克风信号作为目标信号和输入信号,用输入信号去逼近目标信号,通过调整滤波器系数得到TDOA。
2. TDOA定位
Write pictures described here

TDOA估值进行声源定位,三颗麦克风阵列可以确定空间声源位置,增加麦克风会增高数据精度。定位的方法有MLE最大似然估计,最小方差,球形差值和线性相交等。TDOA相对来讲应用广泛,定位精度高,且计算量最小,实时性好,可用于实时跟踪,在目前大部分的智能定位产品中均采用TDOA技术做为定位技术。

波束形成:

波束形成可分为常规的波束形成CBF,Conventional Beam Forming和自适应波束形成ABF,Adaptive Beam Forming。CBF是最简单的非自适应波束形成,对各个麦克风的输出进行加权求和得到波束,在CBF中,各个通道的权值是固定的,作用是抑制阵列方向图的旁瓣电平,以滤除旁瓣区域的干扰和噪声。ABF在CBF的基础之上,对干扰和噪声进行空域自适应滤波。ABF中,采用不同的滤波器得到不同的算法,即不同通道的幅度加权值是根据某种最优准则进行调整和优化。如LMS,LS,最大SNR,LCMV(线性约束最小方差,linearly constrained Minimum Variance)。采用LCMV准则得到的是MVDR波束形成器(最小方差无畸变响应,Minimum Variance Distortionless Response)。LCMV的准则是在保证方向图主瓣增益保持不变的情况下,使阵列的输出功率最小,表明阵列输出的干扰加噪声功率最小,也可以理解为是最大SINR准则,从而能最大可能的接收信号和抑制噪声和干扰。

CBF-传统的波束形成

Write pictures described here

The method incident signal delay sum beamformer for speech enhancement, for the microphone by delaying a received signal, compensating each microphone to the sound source of the time difference, so that each output signal of the same phase in one direction, so that the direction obtained the maximum gain, the maximum output power such that the direction of the main beam. Forming a spatial filter, so that the array has a directional selectivity.

CBF + Adaptive Filter enhanced beamformed

Write pictures described here

Weiner filtering to improve the binding effect of enhancing speech, noisy speech after speech signal to give pure Weiner filtering criteria based on LMS. And the filter coefficients may be updated iteration, the CBF compared to the conventional, you can more effectively remove non-stationary noise.

ABF- adaptive beamforming

Write pictures described here

GSLC ANC is based on the method of active noise cancellation, while noisy signals through the main and secondary channels, the auxiliary channel blocking matrix and the filtered speech signal, the reference signal to obtain a multi-channel contains only noise, the noise in each channel signal to get an optimal signal estimate, get clean speech signal estimation.

The future development of array technology

Microphone array with respect to the single microphone system technology has many advantages, has become an important part of speech enhancement and speech signal processing. Speech enhancement and sound localization has become an indispensable part of the array technology, video conferencing, intelligent robots, hearing aids, smart appliances, communications, smart toys, automotive areas need sound localization and speech enhancement. Various signal processing techniques, array signal processing techniques are gradually incorporated into the voice processing system among the microphone array, and algorithm gradually improved and further widely used. In a complex environment noise, reverberation environment, acoustic environment, powerful hardware processing power makes real-time processing of complex algorithms for speech enhancement as possible. In the future, closely integrated voice and image will become a new breakthrough in the field of artificial intelligence, artificial intelligence at the cusp, who is speech recognition, speech understanding, array signal processing, far-field voice, image recognition, face recognition, iris recognition, voiceprint recognition technology and ingenious combination of organic, nature and technology and people-centered objectives with the perfect combination, let us wait and see.

Guess you like

Origin www.cnblogs.com/focus-z/p/12078578.html