Adaptive filtering method - LMS algorithm

adaptive filter

  • Adaptive filter: a digital filter that can automatically adjust its parameters according to the input signal

  • Non-adaptive filter: a digital filter with a static filter system, the static coefficients forming the transfer function of the filter

  • For some applications (such as system identification, prediction, denoising, etc.), the parameters to be operated cannot be known in advance, and adaptive coefficients must be used for processing. In this case, adaptive filters are usually used

  • When an adaptive filter processes a speech signal, it does not need to know the statistical characteristics of the input signal and noise. The filter itself can learn or estimate the statistical characteristics of the signal during the working process, and adjust its own parameters based on this to achieve a certain Optimal filtering effect under criterion/cost function

  • Adaptive filter is an effective means to deal with non-stationary signals
    insert image description here

  • Let's start with an N-order filter and gradually transition to an adaptive filter

N order filter

  • An N-order filter with parameters w ( n ) \text{w}(n)w ( n ) , contains N independent filter coefficients
  • The output of the filter is a linear convolution of the input signal and the filter coefficients:
    y ( n ) = ∑ i = 0 N − 1 wi ( n ) x ( n − i ) = w T ( n ) x ( n ) = x T ( n ) w ( n ) y(n)=\sum_{i=0}^{N-1}w_i(n)x(ni)=\text{w}^T(n)\text{x} (n)=\text{x}^T(n)\text{w}(n)and ( n )=i=0N1wi(n)x(ni)=wT(n)x(n)=xT(n)w(n)
  • in,
    • x ( n ) = [ x ( n ) , x ( n − 1 ) , . . . , x ( n − N + 1 ) ] T \text{x}(n)=[x(n),x(n-1),...,x(n-N+1)]^T x(n)=[x(n),x(n1),...,x(nN+1)]T
    • w ( n ) = [ w 0 ( n ) , w 1 ( n ) , . . . , w N − 1 ( n ) ] T \text{w}(n)=[w_0(n),w_1(n),...,w_{N-1}(n)]^T w(n)=[w0(n),w1(n),...,wN1(n)]T
    • n in parentheses represent different moments
  • Define the desired output as d(n), and the error sequence is: e ( n ) = d ( n ) − y ( n ) = d ( n ) − w T ( n ) x ( n ) e(n)=d(n )-y(n)=d(n)-\text{w}^T(n)\text{x}(n)e(n)=d(n)and ( n )=d(n)wT(n)x(n)
  • Since the error at time n is defined in terms of the filter coefficients at time n, this error is called "a posteriori error"

Standard LMS algorithm

  • According to the minimum mean square error (MMSE) criterion, the objective function to minimize is
    J ( w ) = E [ ∣ e ( n ) ∣ 2 ] = E [ ∣ d ( n ) − w T ( n ) x ( n ) ∣ 2 ] J(\text{w})=E[|e(n)|^2]=E[|d(n)-\text{w}^T(n)\text{x}(n)| ^2]J(w)=E[e(n)2]=E[d(n)wT(n)x(n)2]
  • In order to minimize the mean square error function, it is necessary to calculate J ( w ) J(\text{w})J ( w ) VS w's derivation, derivation is zero:
    E [ x ( n ) x T ( n ) ] w ( n ) − E [ x ( n ) d ( n ) ] = R w − r = 0 E [\text{x}(n)\text{x}^T(n)]\text{w}(n)-E[\text{x}(n)d(n)]=R\text{w }-r=0E[x(n)xT(n)]w(n)E[x(n)d(n)]=Rwr=0
  • Can get: wopt = R − 1 r \text{w}_{opt}=R^{-1}rwopt=R1r
  • w o p t \text{w}_{opt} woptThe defined filter is a Wiener filter, and the Wiener filter is a statistically optimal filter in the sense of the smallest mean square error
  • Disadvantages of Wiener filter:
    • Not suitable for non-stationary processes, and speech is a non-stationary signal
    • The calculation of mathematical expectation needs to use all historical data to estimate the current filter coefficient, which is not a reasonable choice
  • Due to the mathematical expectations involved, in practical applications, the true autocorrelation matrix R of the signal and the cross-correlation vector r between the input signal and the expected output cannot be obtained. Therefore, in practical applications, the square of the instantaneous error is used to replace the mathematical expectation in the objective function, and then the instantaneous gradient is obtained: ▽ ( n ) = − 2 e ( n ) x ( n ) \triangledown (n)=-2e( n)\text{x}(n)(n)=2 e ( n ) x ( n ) , and then update the filter coefficients along the negative gradient direction:
    w ( n + 1 ) = w ( n ) + 2 μ x ( n ) e ( n ) \text{w}( n+1)=\text{w}(n)+2\mu \text{x}(n)e(n)w(n+1)=w(n)+2μx(n)e(n)
  • The above formula is the update formula of the standard time-series LMS algorithm, which is updated point by point. Whenever a new x(n) and d(n) are given, the filter coefficients are updated once
  • The execution flow of the standard LMS algorithm:
    • Initialize w(0), x(0)
    • For each new input value x(n), calculate the output value y(n)
    • Using the expected output d(n), calculate the error value e(n) to get the gradient
    • Update the filter coefficients to get w(n+1)
    • Return to step 2 until the end, the output sequence and error sequence can be obtained
  • Standard LMS algorithm
    • The basic idea is gradient descent
    • The advantage is that the algorithm is simple and easy to implement
    • The disadvantage is that the convergence speed is slow, and the tracking performance is relatively poor due to the presence of gradient noise

block (block) LMS algorithm

  • With more sample points, update the filter coefficient once:
    w ( n + 1 ) = w ( n ) + 2 μ x ( n ) e ( n ) w ( n + 2 ) = w ( n + 1 ) + 2 μ x ( n + 1 ) e ( n + 1 ) . . . w ( n + L ) = w ( n + L − 1 ) + 2 μ x ( n + L − 1 ) e ( n + L − 1 ) \ begin{aligned} \text{w}(n+1)&=\text{w}(n)+2\mu \text{x}(n)e(n) \\ \text{w}(n+ 2)&=\text{w}(n+1)+2\mu \text{x}(n+1)e(n+1) \\ ... \\ \text{w}(n+L )&=\text{w}(n+L-1)+2\mu \text{x}(n+L-1)e(n+L-1) \end{aligned}w(n+1)w(n+2)...w(n+L)=w(n)+2μx(n)e(n)=w(n+1)+2μx(n+1)e(n+1)=w(n+L1)+2μx(n+L1)e(n+L1)
  • 得: w ( n + L ) = w ( n ) + 2 μ ∑ m = 0 L − 1 x ( n + m ) e ( n + m ) \text{w}(n+L)=\text{w}(n)+2\mu \sum_{m=0}^{L-1} \text{x}(n+m)e(n+m) w(n+L)=w(n)+2 mm=0L1x(n+m)e(n+m)
  • Update in blocks, update the filter coefficients every L points
  • Note that the error sequence is obtained through the same filter: e ( n + m ) = d ( n + m ) − x T ( n + m ) w ( n ) e(n+m)=d(n+m )-\text{x}^T(n+m)\text{w}(n)e(n+m)=d(n+m)xT(n+m)w(n)
  • For the convenience of subsequent pushes, let L=N, that is to say, when the number of collected sampling points is equal to the order of the filter, the filter coefficients are updated.
  • Compared with the standard LMS, the update frequency of the block LMS is reduced by L times, so a new time index k is defined to replace n, k = n L k=\frac{n}{L}k=Ln
  • Thus the block LMS update formula is: w ( k + 1 ) = w ( k ) + 2 μ ∑ m = 0 L − 1 x ( k L + m ) e ( k L + m ) \text{w}(k+ 1)=\text{w}(k)+2\mu \sum_{m=0}^{L-1} \text{x}(kL+m)e(kL+m)w(k+1)=w(k)+2 mm=0L1x(kL+m)e(kL+m)
  • In block LMS, the instantaneous gradient is: ▽ ( k ) = − 2 ∑ m = 0 L − 1 x ( k L + m ) e ( k L + m ) \triangledown (k)=-2 \sum_{m=0 }^{L-1} \text{x}(kL+m) e(kL+m)(k)=2m=0L1x(kL+m)e(kL+m)
  • The core operation of block LMS:
    • The output sequence is the linear convolution of the input signal and the filter coefficients: y ( n + m ) = x T ( n + m ) w ( n ) , m ∈ [ 0 , N − 1 ] y(n+m)=\ text{x}^T(n+m)\text{w}(n), m \in [0,N-1]and ( n)+m)=xT(n+m)w(n)m[0,N1]
    • The instantaneous gradient is the linear correlation between the input signal and the error sequence: ▽ ( k ) = − 2 ∑ m = 0 L − 1 x ( k L + m ) e ( k L + m ) \triangledown (k)=-2 \sum_ {m=0}^{L-1} \text{x}(kL+m) e(kL+m)(k)=2m=0L1x(kL+m)e(kL+m)
  • There are Overlap-save method and Overlap-add method for calculating linear convolution using FFT

Overlap-save method

  • The relationship between linear convolution and circular convolution: Generally, if the lengths of two finite length sequences are N 1 , N 2 N_1, N_2N1N2, and satisfy N 1 ≥ N 2 N_1 \ge N_2N1N2, then the post N 1 − N 2 + 1 N_1-N_2+1 of the circular convolutionN1N2+1 point, consistent with the result of linear convolution

  • The relationship between linear correlation and circular correlation: Generally, if the lengths of two finite length sequences are N 1 , N 2 N_1, N_2N1N2, and satisfy N 1 ≥ N 2 N_1 \ge N_2N1N2, then the first N 1 − N 2 + 1 N_1-N_2+1 of the circular correlationN1N2+1 point, consistent with results for linear correlation

  • If you want to understand the above two relationships, you can refer to the basic operations of digital signals - linear convolution (correlation) and circular convolution (correlation)

  • In order to get the output sequence of N points: y ( n + m ) = x T ( n + m ) w ( n ) , m ∈ [ 0 , N − 1 ] y(n+m)=\text{x}^ T(n+m)\text{w}(n), m \in [0,N-1]and ( n)+m)=xT(n+m)w(n)m[0,N1 ] , it is necessary to ensure that the results of linear convolution and circular convolution of at least N points are the same, that is,N 1 − N 2 + 1 ≥ N N_1-N_2+1 \ge NN1N2+1N

  • Because N 1 ≥ N 2 N_1 \ge N_2N1N2, and the filter order is N, so N 2 = N N_2=NN2=N , so the condition becomes:N 1 ≥ 2 N − 1 N_1 \ge 2N-1N12N _1

  • In order to facilitate the calculation of FFT, let N 1 = 2 N N_1=2NN1=2N _

  • According to the figure below, construct x(n) and w(n)
    insert image description here

  • Calculate the Fourier transform of the input signal vector and the filter vector separately
    X ( k ) = diag { F [ x ( k N − N ) , . . . , x ( k N − 1 ) , x ( k N ) , . . . , x ( k N + N − 1 ) ] } W ( k ) = F [ w T ( k ) , 0 , . . . , 0 ] TY ( k ) = X ( k ) W ( k ) \begin {aligned} X(k)&=diag\{ F[x(kN-N), ..., x(kN-1), x(kN), ..., x(kN+N-1)] \} \\ W(k)&=F[\text{w}^T(k), 0, ..., 0]^T \\ Y(k)&=X(k)W(k) \ end{aligned}X(k)W(k)Y ( k )=diag{ F[x(kNN),...,x(kN1),x(kN),...,x(kN+N1)]}=F[wT(k),0,...,0]T=X(k)W(k)

  • The length of each Y(k) is 2N, and the N-point linear convolution of the output sequence y(k) is equal to the N points of Y(k) after Fourier inverse transformation y ( k ) = [ y ( k N
    ) , y ( k N + 1 ) , y ( k N ) , . . . , y ( k N + N − 1 ) ] = last N samples of F − 1 Y ( k ) \begin{aligned} y(k) &=[y(kN), y(kN+1), y(kN), ..., y(kN+N-1)] \\ &=\text{last N samples of }F^{-1 }Y(k) \end{aligned}and ( k )=[ y ( k N ) ,y ( k N+1),y ( k N ) ,...,y ( k N+N1)]=last N samples of F1Y(k)

  • The calculation process of y(k) is as follows:
    insert image description here

  • Next calculate the sum of the instantaneous gradients of N points: ▽ ( k ) = − 2 ∑ m = 0 L − 1 x ( k L + m ) e ( k L + m ) \triangledown (k)=-2 \sum_ {m=0}^{L-1} \text{x}(kL+m) e(kL+m)(k)=2m=0L1x(kL+m)e(kL+m)

  • For linear correlation operations, it is still transformed to the frequency domain, and the product of the conjugate spectrum of the input signal and the error sequence is calculated

  • X(k) has been obtained above, and then the error sequence e ( k ) = [ e ( k N ) , e ( k N + 1 ) , . . . , e ( k N + N − 1 ) ] e (k)=[e(kN), e(kN+1), ..., e(kN+N-1)]e(k)=[ e ( k N ) ,e ( k N+1),...,e ( k N+N1 )] is also extended to a length of 2N

  • Note: When calculating convolution, add N 0s after w(n) , and when calculating correlation, add N 0s in front of e(n)

  • Fourier transform of the error sequence E ( k ) = F [ 0 , 0 , . . . , 0 , e T ( k ) ] TE(k)=F[0, 0, ..., 0, e^T (k)]^TE(k)=F[0,0,...,0,eT(k)]T

  • Fourier transform of instantaneous gradient ▽ ( k ) = XH ( k ) E ( k ) \bigtriangledown (k)=X^H(k)E(k)(k)=XH(k)E(k)

  • For each ▽ ( k ) \bigtriangledown (k)The length of ▽ ( k ) is 2N, and the N-point linear correlation of the sum of instantaneous gradients is equal to▽ ( k ) \bigtriangledown (k)( k ) the first N points after inverse Fourier transform
    ▽ ⃗ ( k ) = [ ▽ ( k N ) , ▽ ( k N + 1 ) , ▽ ( k N ) , . . . , ▽ ( k N + N − 1 ) ] = first N samples of F − 1 ▽ ( k ) \begin{aligned} \vec{\triangledown}(k)&=[\triangledown(kN), \triangledown(kN+1), \ triangledown(kN), ..., \triangledown(kN+N-1)] \\ &=\text{first N samples of }F^{-1}\bigtriangledown(k) \end{aligned} (k)=[(kN),(kN+1),(kN),...,(kN+N1)]=first N samples of F1(k)

Frequency-Domain Adaptive Filter(FDAF)

  • The update of the filter coefficients can also be done in the frequency domain W ( k + 1 ) = W ( k ) + 2 μ F [ ▽ ⃗ T ( k ) , 0 , . . . , 0 ] TW(k+1)=W (k)+2\mu F[\vec{\triangledown}^T(k), 0, ..., 0]^TW(k+1)=W(k)+2μF[ T(k),0,...,0]T
  • Notice
    • The filter coefficients are updated directly in the frequency domain, so the gradient vector needs to be transformed to the frequency domain again
    • Since the filter coefficients are filled with N zeros in the back, the gradient vector also needs to be filled with N zeros in the back.
  • The overall framework of FDAF is as follows:
    insert image description here

Guess you like

Origin blog.csdn.net/m0_46324847/article/details/130917348