Matlab v_melbankm函数参数详解(英文附例)

Matlab v_melbankm函数参数详解(英文附例)

笔者使用的是R2019的matlab,下载了voicebox安装至matlab路径下即可使用。下载voicebox请参看此博客
需要注意的是,melbankm改成了v_melbankm,今天自己使用此函数时后面几个参数不知道含义,翻了源文件看看,比较懒,没翻译成中文。看到一篇更好的解释和与v_melcepst的比较博客请戳这里

函数解释

v_melbankm determine matrix for a mel/erb/bark-spaced v_filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

Inputs:

p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
n length of fft
fs sample rate in Hz
fl low end of the lowest filter as a fraction of fs [default = 0]
fh high end of highest filter as a fraction of fs [default = 0.5]
w any sensible combination of the following:
‘b’ = bark scale instead of mel
‘e’ = erb-rate scale
‘l’ = log10 Hz frequency scale
‘f’ = linear frequency scale
‘c’ = fl/fh specify centre of low and high filters
‘h’ = fl/fh are in Hz instead of fractions of fs
‘H’ = fl/fh are in mel/erb/bark/log10
‘t’ = triangular shaped filters in mel/erb/bark domain (default)
‘n’ = hanning shaped filters in mel/erb/bark domain
‘m’ = hamming shaped filters in mel/erb/bark domain
‘z’ = highest and lowest filters taper down to zero [default]
‘y’ = lowest filter remains at 1 down to 0 frequency and highest filter remains at 1 up to nyquist freqency
‘u’ = scale filters to sum to unity
‘s’ = single-sided: do not double filters to account for negative frequencies
‘g’ = plot idealized filters [default if no output arguments present]

Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.

Some people instead define an asymmetric triangular filter in the frequency domain.

If ‘ty’ or ‘ny’ is specified, the total power in the fft is preserved.

Outputs:
x a sparse matrix containing the v_filterbank amplitudes
If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
otherwise size(x)=[p,1+floor(n/2)]
Note that the peak filter values equal 2 to account for the power
in the negative FFT frequencies.

mc the v_filterbank centre frequencies in mel/erb/bark
mn the lowest fft bin with a non-zero coefficient
mx the highest fft bin with a non-zero coefficient
Note: you must specify both or neither of mn and mx.

Examples of use:

(a) Calcuate the Mel-frequency Cepstral Coefficients

f=v_rfft(s);			        % v_rfft() returns only 1+floor(n/2) coefficients
x=v_melbankm(p,n,fs);	        % n is the fft length, p is the number of filters wanted
z=log(x*abs(f).^2);             % multiply x by the power spectrum
c=dct(z);                       % take the DCT

(b) Calcuate the Mel-frequency Cepstral Coefficients efficiently

    f=fft(s);                        % n is the fft length, p is the number of filters wanted
    [x,mc,na,nb]=v_melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
    z=log(x*(f(na:nb)).*conj(f(na:nb)));

© Plot the calculated filterbanks

   plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency

(d) Plot the idealized filterbanks (without output sampling)

   v_melbankm(p,n,fs);

References:
[1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937.
[2] S. Davis and P. Mermelstein. Comparison of parametric representations for
monosyllabic word recognition in continuously spoken sentences.
IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.

发布了6 篇原创文章 · 获赞 5 · 访问量 352

猜你喜欢

转载自blog.csdn.net/weixin_46422143/article/details/105149275