Sound source localization algorithm GCC-PHAT

Existing sound source localization algorithms can be roughly divided into three categories: a) algorithms based on time-delay estimation (TDE); b) algorithms based on high-resolution spectral estimation; c) algorithms based on sparse representation.

The core of the algorithm based on TDE lies in the accurate estimation of the propagation delay, which is generally obtained by performing cross-correlation processing on the signals between the microphones. To further obtain sound source location information, methods such as simple delay summation, geometric calculation, or direct use of cross-correlation results for controllable power response search can be performed. This type of algorithm is relatively simple to implement, has a small amount of computation, and is convenient for real-time processing, so it is the most widely used in practice.

GCC-PHAT

The delay estimation algorithm based on the generalized cross-correlation function introduces a weighting function to adjust the cross-power spectral density to optimize the performance of delay estimation. According to different weighting functions, the generalized cross-correlation function has many different deformations, among which the generalized cross-correlation-phase transformation method (Generalized Cross Correlation PHAse Transformation, GCC-PHAT) method is the most widely used. The GCC-PHAT method itself has certain anti-noise and anti-reverberation capabilities, but when the signal-to-noise ratio is reduced and the reverberation is enhanced, the performance of the algorithm drops sharply.

1. Calculate the propagation delay

The received signals of the two microphones in the array are:

                           

The relevant parameters can be seen in the following figure:

    

The cross-correlation algorithm is often used for delay estimation, which is expressed as:

Substitute into the signal model, there are:

At this time, because s(t) and n 1 (t) are uncorrelated, the above formula can be simplified to:

where τ 1212 , assuming that n 1 and n 2 are uncorrelated Gaussian white noise, the above formula can be further simplified as:

It can be known from the properties of the correlation function that when τ 1212 , Rx 1 x 2 (τ) takes the maximum value, which is the time delay between the two microphones.

The relationship between the cross-correlation function and the cross-power spectrum:

 In the actual model of microphone array signal processing, due to the influence of reverberation and noise, the peak value of Rx 1 x 2 (τ) is not obvious, which reduces the accuracy of delay estimation. In order to sharpen the peak of Rx 1 x 2 (τ), the cross-power spectrum can be weighted in the frequency domain according to the prior knowledge of the signal and noise, which can suppress noise and reverberation interference. Finally, inverse Fourier transform is performed to obtain the generalized cross-correlation function Rx 1 x 2 (τ):

where φ 12 (w) represents the frequency domain weighting function. The block diagram of the generalized cross-correlation delay estimation algorithm is as follows:

 

 

2. Commonly used weighting functions and their characteristics

 The expression of the phase transformation weighting function is:

It can be seen from the above formula that the phase transform weighting function is essentially a whitening filter, which makes the cross-power spectrum between the signals smoother, thereby sharpening the generalized cross-correlation function. After PHAT weighting, the expression of the generalized cross-correlation function of Rx 1 x 2 (τ) is:

It can be seen that the PHAT-weighted cross-power spectrum approximates the expression of the unit impulse response, which highlights the peak value of the delay, which can effectively suppress the reverberation noise and improve the accuracy and accuracy of the delay estimation.

3. Cross-correlation function

The cross-correlation function of x ( n ) and y ( n ) is to keep x ( n ) unchanged, move y ( n ) to the left by m sample points, and multiply the two sequences one by one. The order cannot be interchanged. However, the method of obtaining the cross-correlation function in the time-domain convolution method has a relatively large computational complexity, so the operation (FFT and IFFT) will be performed in the frequency domain, that is, the FFT algorithm of linear convolution.

The frequency domain of the cross-correlation function of two signals is equal to the conjugate of the frequency domain of the x signal multiplied by the frequency domain of the Y signal.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325770492&siteId=291194637