Matlab中MFCC的几种实现方式（转载）

几种函数对比及说明

melbankm
由Voicebox提供，在Mel频率上设计平均分布的滤波器，此函数与音频信号没有关系，只是做MFCC前对滤波器的设计。

function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)
%MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
%
% Inputs:
%       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
%       n   length of fft
%       fs  sample rate in Hz
%       fl  low end of the lowest filter as a fraction of fs [default = 0]
%       fh  high end of highest filter as a fraction of fs [default = 0.5]
%       w   any sensible combination of the following:
%             可取代Mel频率的选项：
%             'b' = bark scale instead of mel
%             'e' = erb-rate scale
%             'l' = log10 Hz frequency scale
%             'f' = linear frequency scale
%
%             'c' = fl/fh specify centre of low and high filters
%             'h' = fl/fh are in Hz instead of fractions of fs
%             'H' = fl/fh are in mel/erb/bark/log10
%
%             滤波器形状：
%             't' = triangular shaped filters in mel/erb/bark domain (default)
%             'n' = hanning shaped filters in mel/erb/bark domain
%             'm' = hamming shaped filters in mel/erb/bark domain
%
%             'z' = highest and lowest filters taper down to zero [default]
%             'y' = lowest filter remains at 1 down to 0 frequency and
%                   highest filter remains at 1 up to nyquist freqency
%
%             'u' = scale filters to sum to unity
%
%             's' = single-sided: do not double filters to account for negative frequencies
%
%             输出滤波器组的响应曲线：
%             'g' = plot idealized filters [default if no output arguments present]
%
% Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
% Some people instead define an asymmetric triangular filter in the frequency domain.
%
%              If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs:  x     a sparse matrix containing the filterbank amplitudes
%                 If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
%                 otherwise size(x)=[p,1+floor(n/2)]
%                 Note that the peak filter values equal 2 to account for the power
%                 in the negative FFT frequencies.
%           mc    the filterbank centre frequencies in mel/erb/bark滤波器中心频率
%           mn    the lowest fft bin with a non-zero coefficient
%           mx    the highest fft bin with a non-zero coefficient
%                 Note: you must specify both or neither of mn and mx.mn与mx必须同时指定或者不指定
%
% =============================！用法举例（MFCC流程）==============================
%
% (a) Calcuate the Mel-frequency Cepstral Coefficients
%
%       f=rfft(s);                  % rfft() returns only 1+floor(n/2) coefficients去除虚数部分
%       x=melbankm(p,n,fs);         % n is the fft length, p is the number of filters wanted
%       z=log(x*abs(f).^2);         % multiply x by the power spectrum
%       c=dct(z);                   % take the DCT
%
% (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
%
%       f=fft(s);                        % n is the fft length, p is the number of filters wanted
%       [x,mc,na,nb]=melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
%       z=log(x*(f(na:nb)).*conj(f(na:nb)));
%
% (c) Plot the calculated filterbanks
%
%      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency
%
% (d) Plot the idealized filterbanks (without output sampling)
%
%      melbankm(p,n,fs);

该函数只是设计滤波器组，属于MFCC处理的一部分。

mfcc_m
由宋知用老师书中提供，涉及到归一化Mel滤波器组系数、归一化倒谱提升窗口。

bank=melbankm(p,frameSize,fs,0,0.5,'m');
% 归一化Mel滤波器组系数
bank=full(bank);
bank=bank/max(bank( : ));

% 归一化倒谱提升窗口：对MFCC系数中某些谱线进行增强

w = 1 + 6 * sin(pi * [1:p2] ./ p2);
w = w/max(w);

需要修正的地方：
只有一阶差分系数；
滤波器选择后并不能只截取想要的部分；
归一化Mel滤波器组系数、归一化倒谱提升窗口有待考证。

melcepst
属于voicebox工具箱，现在官方已经不提供了，程序中调用了melbankm函数。

function [c,tc]=melcepst(s,fs,w,nc,p,n,inc,fl,fh)
%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
%
%
% Simple use: (1) c=melcepst(s,fs)          % calculate mel cepstrum with 12 coefs, 256 sample frames
%             (2) c=melcepst(s,fs,'E0dD')   % include log energy, 0th cepstral coef, delta and delta-delta coefs
%
% Inputs:
%     s   speech signal
%     fs  sample rate in Hz (default 11025)
%     w   mode string (see below)
%     nc  number of cepstral coefficients excluding 0'th coefficient [default 12] MFCC维数设定
%     p   number of filters in filterbank [default: floor(3*log(fs)) =  approx 2.1 per ocatave] 滤波器数量
%     n   length of frame in samples [default power of 2 < (0.03*fs)] 帧长
%     inc frame increment [default n/2] 帧移
%     fl  low end of the lowest filter as a fraction of fs [default = 0] 滤波器最低频率
%     fh  high end of highest filter as a fraction of fs [default = 0.5] 滤波器最高频率，通过fs归一化
%
%     w   any sensible combination of the following:
%               时域窗函数：
%               'R'  rectangular window in time domain
%               'N'  Hanning window in time domain
%               'M'  Hamming window in time domain (default)
%
%               频域窗函数：
%               't'  triangular shaped filters in mel domain (default)
%               'n'  hanning shaped filters in mel domain
%               'm'  hamming shaped filters in mel domain
%
%
%               'p'  filters act in the power domain
%               'a'  filters act in the absolute magnitude domain (default)
%
%               MFCC除12维基本参数之外的选择：
%               '0'  include 0'th order cepstral coefficient
%               'E'  include log energy
%               'd'  include delta coefficients (dc/dt)
%               'D'  include delta-delta coefficients (d^2c/dt^2)
%
%               滤波器频率设置：
%               'z'  highest and lowest filters taper down to zero (default)
%               'y'  lowest filter remains at 1 down to 0 frequency and
%                    highest filter remains at 1 up to nyquist freqency
%
%              If 'ty' or 'ny' is specified, the total power in the fft is preserved.
%
% Outputs:  c     mel cepstrum output: one frame per row. Log energy, if requested, is the
%                 first element of each row followed by the delta and then the delta-delta
%                 coefficients.
%           tc    fractional time in samples at the centre of each frame
%                 with the first sample being 1.

% ==================================设置默认参数=================================
if nargin<2 fs=11025; end% 滤波器的最高频率
if nargin<3 w='M'; end% hamming窗
if nargin<4 nc=12; end% MFCC维数
if nargin<5 p=floor(3*log(fs)); end% p个滤波器
if nargin<6 n=pow2(floor(log2(0.03*fs))); end% n是一帧FFT后数据的长度
if nargin<9
   fh=0.5;% 滤波器的最高频率，用fs归一化   
   if nargin<8
     fl=0;% 设计滤波器的最低频率
     if nargin<7
        inc=floor(n/2);
     end
  end
end

if isempty(w)
   w='M';
end
if any(w=='R')
   [z,tc]=enframe(s,n,inc);
elseif any (w=='N')
   [z,tc]=enframe(s,hanning(n),inc);
else
   [z,tc]=enframe(s,hamming(n),inc);
end

% =================================！理论核心部分=================================
f=rfft(z.');
[m,a,b]=melbankm(p,n,fs,fl,fh,w);% m为滤波器的频域响应
pw=f(a:b,:).*conj(f(a:b,:));% 计算帧能量
pth=max(pw(:))*1E-20;
if any(w=='p')
   y=log(max(m*pw,pth));
else
   ath=sqrt(pth);
   y=log(max(m*abs(f(a:b,:)),ath));
end
c=rdct(y).';% 得到13维系数

nf=size(c,1);
nc=nc+1;
if p>nc
   c(:,nc+1:end)=[];% 当滤波器个数比所需维数多的时候，就将后面滤波器获得的参数删去
elseif p<nc
   c=[c zeros(nf,nc-p)];% 滤波器个数少的时候，用0补齐
end
if ~any(w=='0')
   c(:,1)=[];
   nc=nc-1;
end
if any(w=='E')
   c=[log(max(sum(pw),pth)).' c];
   nc=nc+1;
end

% ===============================计算一阶和二阶差分==============================
if any(w=='D')
  vf=(4:-1:-4)/60;
  af=(1:-1:-1)/2;
  ww=ones(5,1);
  cx=[c(ww,:); c; c(nf*ww,:)];
  vx=reshape(filter(vf,1,cx(:)),nf+10,nc);
  vx(1:8,:)=[];
  ax=reshape(filter(af,1,vx(:)),nf+2,nc);
  ax(1:2,:)=[];
  vx([1 nf+2],:)=[];
  if any(w=='d')
     c=[c vx ax];
  else
     c=[c ax];
  end
elseif any(w=='d')
  vf=(4:-1:-4)/60;
  ww=ones(4,1);
  cx=[c(ww,:); c; c(nf*ww,:)];
  vx=reshape(filter(vf,1,cx(:)),nf+8,nc);
  vx(1:8,:)=[];
  c=[c vx];
end
 
% =======================如果不输出任何参数，就会输出语谱图==========================
if nargout<1
   [nf,nc]=size(c);
%    t=((0:nf-1)*inc+(n-1)/2)/fs;
   ci=(1:nc)-any(w=='0')-any(w=='E');
   imh = imagesc(tc/fs,ci,c.');
   axis('xy');
   xlabel('Time (s)');
   ylabel('Mel-cepstrum coefficient');
    map = (0:63)'/63;
    colormap([map map map]);
    colorbar;
end

melcepst默认得到12维MFCC参数，时域中用hamming窗，频域中用三角窗，最低频率为0，最高频率为采样频率的一半（采样定理），帧移为帧长的一半，帧长为2的次幂但是小于0.03*fs。
E：包括对数能量
0：包括0阶倒谱系数
d：包括一阶差分
D：包括二阶差分
melcepst对参数’0’的处理

if ~any(w=='0')
   c(:,1)=[];
   nc=nc-1;
end

如果不需要’0’阶系数，就将第一列删除，并得到13-1=12维数据，说明DCT后得到的是13维数据，默认将第一个元素，即0阶倒谱系数删去。第一维比后12维都大很多（直流项？）。

默认12维参数
DCT后13维参数（'0'）

13维参数'E'

cepstralFeatureExtractor
由Audio Toolbox提供，需要先将音频分帧，每一列作为一帧，再将每一帧依次输入至cepstralFeatureExtractor，所以输入的第一帧的delta与deltaDelta都是0。

test = 'D:\DataBase\TIMIT\TRAIN\DR2\MARC0\SX108.WAV';
[x, fs] = audioread(test);
n=pow2(floor(log2(0.03*fs)));
inc=floor(n/2);
f = enframe(x,hamming(n),inc);
cepFeatures = cepstralFeatureExtractor('SampleRate',fs,'LogEnergy','Replace');
[coeffs, delta, deltaDelta]= cepFeatures(f(1,:)');

参数设置中有FilterBankNormalization，选项为：Area，Bandwidth（默认），None，用于滤波器组的权重分配。

滤波器归一化

cepstralFeatureExtractor类的部分代码：

classdef (StrictDefaults)cepstralFeatureExtractor < dsp.private.SampleRateEngine
 %cepstralFeatureExtractor Cepstral Feature Extractor
 %   cepFeatures = cepstralFeatureExtractor returns a System object,
 %   cepFeatures, that calculates cepstral features. Columns of the input
 %   are treated as individual channels.
 %
 %   cepFeatures = cepstralFeatureExtractor('Name',Value, ...) returns a
 %   cepstralFeatureExtractor System object, cepFeatures, with each
 %   specified property name set to the specified value. You can specify
 %   additional name-value pair arguments in any order as
 %   (Name1,Value1,...NameN,ValueN).
 %
 %   step method syntax内置的step()函数:
 %
 %   [COEFFS,DELTA,DELTADELTA] = step(cepFeatures,X) returns the cepstral
 %   coefficients, the delta, and the delta-delta. The log energy is also
 %   returned in the COEFFS output based on the LogEnergy property. The
 %   DELTA and DELTADELTA are initialized as zero-vectors. X must be a
 %   real-valued, double-precision or single-precision matrix. Each column
 %   of X is treated as an independent channel.
 %
 %   System objects may be called directly like a function instead of using
 %   the step method. For example, y = step(obj,x) and y = obj(x) are
 %   equivalent.
 %   对象可以直接作为函数使用，所以step()与obj()功能一致
 %
 %   cepstralFeatureExtractor methods:
 %   step       - See above description for use of this method
 %   release    - Allow property values and input characteristics to change
 %   clone      - Create cepstralFeatureExtractor object with same property 
 %                values
 %   isLocked   - Locked status (logical)
 %   <a href="matlab:help matlab.System/reset   ">reset</a>      - Reset the internal states to initial conditions
 %   getFilters - Get filterbank used to calculate the cepstral 
 %                coefficients
 %
 %   cepstralFeatureExtractor properties:
 %   FilterBank  - Filter bank ('Mel'/'Gammatone')
 %   InputDomain - Domain of input signal
 %   NumCoeffs   - Number of coefficients to return
 %   FFTLength   - FFT length
 %   LogEnergy   - Log energy usage ('Append'/'Replace'/'Ignore')
 %   SampleRate  - Sample rate (Hz)
 %
 %   Advanced properties:
 %   BandEdges               - Band edges of mel filter bank (Hz)
 %   FilterBankNormalization - Normalize filter bank
 %   FilterBankDesignDomain  - Domain for mel filter bank design
 %   FrequencyRange          - Gammatone filter bank frequency range
    
    %#codegen
    properties
        %SampleRate Input sample rate (Hz)
        % Specify the sampling rate of the input in Hertz as a real, finite
        % numeric scalar. The default is 16000 Hz. This property is 
        % tunable.
        SampleRate = 16000;
    end
    
    properties (Constant, Hidden)
        % SampleRateSet is used to setup the choices for SampleRate
        SampleRateSet = matlab.system.SourceSet({'PropertyOrMethod', ...
            'SystemBlock', 'InheritSampleRate', 'getInheritedSampleRate',true});
    end
    
    properties (Nontunable)
        %BandEdges Band edges of Mel filter bank (Hz)
        % Specify the band edges of the mel filter bank as a monotonically
        % increasing vector in the range [0,fs/2]. The number of band edges
        % must be in the range [4,160]. The default band edges are spaced
        % linearly for the first ten and then logarithmically thereafter.
        % This property applies when FilterBank is 'Mel'.
        % 只有是Mel的时候，BandEdges属性才有用
        BandEdges = cepstralFeatureExtractor.getDefaultBandEdges();
        %FFTLength FFT length 默认FFT长度是输入的行数，所以做好分帧！
        FFTLength = [];
        %NumCoeffs Number of coefficients to return 默认MFCC维数13
        NumCoeffs = 13;
        %InputDomain Domain of the input signal 默认输入数据是时域的
        InputDomain = 'Time';
        %FilterBankNormalization Filter bank normalization 默认以带宽设置滤波器权重
        FilterBankNormalization = 'Bandwidth';
        %LogEnergy Log energy usage 默认log能量参数是有的
        LogEnergy = 'Append';
    end
---------------------------------------------------------略-----------------------------------------------------------
end

mfcc
由Audio Toolbox提供，最低频率不是0，它用的是cepstralFeatureExtractor函数。

[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);

function varargout = mfcc(x, fs, varargin)
%MFCC Extract the mfcc, log-energy, delta, and delta-delta of audio signal
%   coeffs = MFCC(audioIn,fs) returns the mel-frequency cepstral
%   coefficients over time for the audio input. Columns of the input are
%   treated as individual channels. coeffs is returned as an L-by-M-by-N
%   array.
%       L - Number of frames the audio signal is partitioned into.
%           This is determined by the WINDOWLENGTH and OVERLAPLENGTH 
%           properties.
%       M - Number coefficients returned per frame.
%           This is determined by the NUMCOEFFS property.
%       N - Number of channels.
%       
%   'WindowLength' defaults to round(0.030 * fs).
%   'OverlapLength' defaults to round(fs*0.02).
%   'NumCoeffs'  If not specified, the number of coefficients is 13.
%   'FFTLength'  By default, the FFT length is set to the WINDOWLENGTH.
%   'DeltaWindowLength' The default is 2.
%   coeffs = MFCC(...,'LogEnergy',LOGENERGY) specifies if and how the log
%   energy is used. Specify log energy as a character vector:
%       'Append'  - Adds the log-energy as the first element of the
%                   returned coefficients vector. This is the default
%                   setting.
%       'Replace' - Replaces the zeroth coefficient (first element of
%                   coeffs) with the log-energy.
%       'Ignore'  - Ignores and does not return the log-energy.

% =========================验证输入数据的格式=============================
validateRequiredInputs(x, fs)

params =  audio.internal.MFCCValidator(fs,size(x,1),varargin{:});% 输入默认的参数

hopLength = params.WindowLength - params.OverlapLength;% 帧移

% ==========================创建mfcc提取object============================
mfccObject = cepstralFeatureExtractor( ...
    'SampleRate',              fs, ...
    'FFTLength',               params.FFTLength, ...
    'NumCoeffs',               params.NumCoeffs, ...
    'LogEnergy',               params.LogEnergy);

% ====================验证所需要的mfcc维数比滤波器个数少===================
numValidBands = sum(mfccObject.BandEdges <= floor(fs/2)) - 2;
coder.internal.errorIf(numValidBands < params.NumCoeffs, ...
    'audio:mfcc:BadNumCoeffs', ...
    numValidBands,fs);

% ==========================mfcc参数获取=================================
[nRow,nChan] = size(x);% 一般都是单通道，audiorea读取到的是一列数据
N            = params.WindowLength;
numHops      = floor((nRow-N)/hopLength) + 1;

y            = audio.internal.buffer(x,N,hopLength);
c            = mfccObject(y);% mfccObject是cepstralFeatureExtractor类，所以，与cepstralFeatureExtractor求解方法一样
c2           = reshape(c , size(c,1) , size(c,2)/nChan   , nChan );
coeffs       = permute(c2 , [2 1 3]);% 将第1维与第2维转置，因为cepstralFeatureExtractor得到的特征是列排的

varargout{1} = coeffs;

%=========================一阶差分====================================
if nargout > 1
    delta        = audio.internal.computeDelta(coeffs,params.DeltaWindowLength);
    varargout{2} = delta;
end

% ============================二阶差分=================================
if nargout > 2
    deltaDelta   = audio.internal.computeDelta(delta,params.DeltaWindowLength);
    varargout{3} = deltaDelta;
end

% -------------------------------------------------------------------------
% Output sample stamp -----------------------------------------------------
if nargout > 3
    varargout{4} = ...
        cast(((0:(numHops-1))*hopLength + params.WindowLength)','like',x);
end

end

% -------------------------------------------------------------------------
% Validate required inputs
% -------------------------------------------------------------------------
function validateRequiredInputs(x,fs)
validateattributes(x,{'single','double'},...
    {'nonempty','2d','real'}, ...
    'mfcc','audioIn')
validateattributes(fs,{'single','double'}, ...
    {'nonempty','positive','real','scalar','nonnan','finite'}, ...
    'mfcc','fs');
end

默认有40个滤波器，得到14维参数，相当于melcepst中的’E0’，只是melcepst的最低频率从0Hz开始；delta与deltaDelta的第一行都是0；loc是每一帧的开始位置。
如何使delta与deltaDelta的首行不为0？设置DeltaWindowLength参数即可。

差分的计算
40组滤波器的频带范围

从滤波器组设置可以看出，每个滤波器的起点是上个滤波器带宽的中点。

HelperComputePitchAndMFCC
查看源码后，发现使用的是mfcc函数
melSpectrogram
output的第一维是Number of bandpass filters in filterbank，默认为32个滤波器；第二维是Number of frames in spectrogram，即帧数。
它不可以计算差分，只是spectrogram的一个小分支，若取40个滤波器，得到的结果与mfcc相近，只是需要转置一下。

几种实现方式的对比

实现方式	MFCC	频谱图
mfcc
cepstralFeatureExtractor
melcepst
melSpectrogram

结论

可见，cepstralFeatureExtractor与mfcc所用算法基本一致，只是cepstralFeatureExtractor分帧求取，melcepst与它们的第2维数据有数量级的差异，暂时认为是滤波器归一化的原因。在mfcc中，log能量是作为额外系数默认附加的，通常Matlab会提供最好的性能，所以暂时按默认选项进行。melSpectrogram默认32个滤波器，mfcc默认40个滤波器，且melSpectrogram不能计算差分，所以mfcc总的来说，更合适作为以后的计算使用。

https://www.jianshu.com/p/1c2742096382

丶葉SiR

发布了6 篇原创文章 · 获赞 5 · 访问量 352

私信关注