[Speech recognition] Isolated word speech recognition based on matlab dynamic time warping (DTW) [including Matlab source code 573]

1. Introduction

The birth of Dynamic Time Warping (DTW) has a certain history (proposed by Japanese scholar Itakura), and its purpose is relatively simple. It is a method to measure the similarity of two time series with different lengths. The application is also relatively wide, mainly in template matching, such as isolated word speech recognition (recognizing whether two speeches represent the same word), gesture recognition, data mining and information retrieval, etc.

1 Overview

In most disciplines, time series are a common representation of data. For time series processing, a common task is to compare the similarity of two series.
In the time series, the lengths of the two time series that need to be compared for similarity may not be the same. In the field of speech recognition, the speech speed of different people is different. Because the voice signal has considerable randomness, even if the same person makes the same tone at different times, it may not have the full length of time. Moreover, the pronunciation speed of different phonemes in the same word is also different. For example, some people will drag the sound of "A" very long, or pronounce "i" very short. In these complex situations, the distance (or similarity) between two time series cannot be effectively obtained using the traditional Euclidean distance.
2 Principle of DTW method

In the time series, the lengths of the two time series that need to be compared for similarity may not be the same. In the field of speech recognition, the speech speed of different people is different. Moreover, the pronunciation speed of different phonemes in the same word is also different. For example, some people will drag the sound of "A" very long, or pronounce "i" very short. In addition, different time series may only have a displacement on the time axis, that is, in the case of a reduction displacement, the two time series are the same. In these complex situations, the distance (or similarity) between two time series cannot be effectively obtained using the traditional Euclidean distance.

DTW calculates the similarity between the two time series by extending and shortening the time series:
Insert picture description here
as shown in the figure above, the upper and lower solid lines represent the two time series, and the dashed line between the time series represents the two time series. Similarities between. DTW uses the sum of the distances between all these similar points, called Warp Path Distance, to measure the similarity between two time series.

2 DTW calculation method:

Let the two time series to be calculated for similarity be X and Y, and the lengths are |X| and |Y|.
Warp Path The Warp Path has
the form of W=w1,w2,…,wK, where Max(|X|,|Y|)<=K<=|X|+|Y|.
The form of wk is (i, j), where i represents the i coordinate in X, and j represents the j coordinate in Y.
The reorganization path W must start with w1=(1,1) and end with wK=(|X|,|Y|) to ensure that each coordinate in X and Y appears in W.
In addition, the i and j of w(i,j) in W must increase monotonically to ensure that the dashed lines in Figure 1 do not intersect. The so-called monotonic increase refers to the
Insert picture description here
Insert picture description here
cost matrix (Cost Matrix) D, D( i, j) represents the normalized path distance between two time series of length i and j.

Second, the source code

function trimmed_X = my_vad(x)
%端点检测;输入为录入语音,输出为有用信号

Ini = 0.1;          %初始静默时间
Ts = 0.01;          %窗的时长
Tsh = 0.005;        %帧移时长
Fs = 16000;         %采样频率
counter1 = 0;       %以下四个参数用来寻找起始点和结束点
counter2 = 0;
counter3 = 0;
counter4 = 0;
ZCRCountf = 0;      %用于存储过零率检测结果
ZCRCountb = 0;     
ZTh = 40;           %过零阈值
w_sam = fix(Ts*Fs);                   %窗口长度
o_sam = fix(Tsh*Fs);                  %帧移长度
lengthX = length(x);
segs = fix((lengthX-w_sam)/o_sam)+1;  %分帧数
sil = fix((Ini-Ts)/Tsh)+1;            %静默时间帧数
win = hamming(w_sam);

Limit = o_sam*(segs-1)+1;             %最后一帧的起始位置

FrmIndex = 1:o_sam:Limit;             %每一帧的起始位置
ZCR_Vector = zeros(1,segs);           %记录每一帧的过零点数
                                     
%短时过零点
for t = 1:segs
    ZCRCounter = 0; 
    nextIndex = (t-1)*o_sam+1;
    for r = nextIndex+1:(nextIndex+w_sam-1)
        if (x(r) >= 0) && (x(r-1) >= 0)
         
        elseif (x(r) > 0) && (x(r-1) < 0)
         ZCRCounter = ZCRCounter + 1;
        elseif (x(r) < 0) && (x(r-1) < 0)
         
        elseif (x(r) < 0) && (x(r-1) > 0)
         ZCRCounter = ZCRCounter + 1;
        end
    end
    ZCR_Vector(t) = ZCRCounter;
end

%短时平均幅度
Erg_Vector = zeros(1,segs);
for u = 1:segs
    nextIndex = (u-1)*o_sam+1;
    Energy = x(nextIndex:nextIndex+w_sam-1).*win;
    Erg_Vector(u) = sum(abs(Energy));
end

IMN = mean(Erg_Vector(1:sil));  %静默能量均值(噪声均值)
IMX = max(Erg_Vector);          %短时平均幅度的最大值
I1 = 0.03 * (IMX-IMN) + IMN;    %I1,I2为初始能量阈值
I2 = 4 * IMN;
ITL = 100*min(I1,I2);            %能量阈值下限,前面系数根据实际情况更改得到合适结果
ITU = 10* ITL;                  %能量阈值上限
IZC = mean(ZCR_Vector(1:sil));  
stdev = std(ZCR_Vector(1:sil)); %静默阶段过零率标准差

IZCT = min(ZTh,IZC+2*stdev);    %过零率阈值
indexi = zeros(1,lengthX);      
indexj = indexi;               
indexk = indexi;
indexl = indexi;

%搜寻超过能量阈值上限的部分
for i = 1:length(Erg_Vector)
    if (Erg_Vector(i) > ITU)
        counter1 = counter1 + 1;
        indexi(counter1) = i;
    end
end
ITUs = indexi(1);        %第一个能量超过阈值上限的帧

%搜寻能量超过能量下限的部分
for j = ITUs:-1:1
    if (Erg_Vector(j) < ITL)
        counter2 = counter2 + 1;
        indexj(counter2) = j;
    end
end
start = indexj(1)+1;    %第一级判决起始帧

Erg_Vectorf = fliplr(Erg_Vector);%将能量矩阵关于中心左右对称,如果是一行向量相当于逆序 

%重复上面过程相当于找结束帧
for k = 1:length(Erg_Vectorf)
    if (Erg_Vectorf(k) > ITU)
        counter3 = counter3 + 1;
        indexk(counter3) = k;
    end
end
%初始化DTW判别矩阵
Scores1 = zeros(1,N);                
Scores2 = zeros(1,N);
Scores3 = zeros(1,N);


%加载模板数据
s1 = load('Vectors1.mat');
fMatrixall1 = struct2cell(s1);
s2 = load('Vectors2.mat');
fMatrixall2 = struct2cell(s2);
s3 = load('Vectors3.mat');
fMatrixall3 = struct2cell(s3);


%计算DTW
for i = 1:N
    fMatrix1 = fMatrixall1{
    
    i,1};
    fMatrix1 = CMN(fMatrix1);
    Scores1(i) = myDTW(fMatrix1,rMatrix);
end

for j = 1:N
    fMatrix2 = fMatrixall2{
    
    j,1};
    fMatrix2 = CMN(fMatrix2);
    Scores2(j) = myDTW(fMatrix2,rMatrix);
end

Three, running results

Insert picture description here

Four, remarks

Complete code or writing add QQ 1564658423 past review
>>>>>>
[Feature extraction] Audio watermark embedding and extraction based on matlab wavelet transform [Include Matlab source code 053]
[Speech processing] Voice signal processing based on matlab GUI [Include Matlab Source code issue 290]
[Voice acquisition] based on matlab GUI voice signal collection [including Matlab source code 291]
[Voice modulation] based on matlab GUI voice amplitude modulation [including Matlab source code 292]
[Speech synthesis] based on matlab GUI voice synthesis [including Matlab Source code issue 293]
[Voice encryption] Voice signal encryption and decryption based on matlab GUI [With Matlab source code 295]
[Speech enhancement] Matlab wavelet transform-based voice enhancement [Matlab source code 296]
[Voice recognition] Based on matlab GUI voice base frequency Recognition [Including Matlab source code 294]
[Speech enhancement] Matlab GUI Wiener filtering based voice enhancement [Including Matlab source code 298]
[Speech processing] Based on matlab GUI voice signal processing [Including Matlab source code 299]
[Signal processing] Based on Matlab speech signal spectrum analyzer [including Matlab source code 325]
[Modulation signal] Digital modulation signal simulation based on matlab GUI [including Matlab source code 336]
[Emotion recognition] Voice emotion recognition based on matlab BP neural network [including Matlab source code 349 Issue]
[Voice Steganography] Quantified Audio Digital Watermarking Based on Matlab Wavelet Transform [Include Matlab Source Code Issue 351]
[Feature extraction] based on matlab audio watermark embedding and extraction [including Matlab source code 350 period]
[speech denoising] based on matlab low pass and adaptive filter denoising [including Matlab source code 352 period]
[emotion recognition] based on matlab GUI voice emotion classification Recognition [Including Matlab source code 354 period]
[Basic processing] Matlab-based speech signal preprocessing [Including Matlab source code 364 period]
[Speech recognition] Matlab Fourier transform 0-9 digital speech recognition [Including Matlab source code 384 period]
[Speech Recognition] 0-9 digital speech recognition based on matlab GUI DTW [including Matlab source code 385]
[Voice playback] Matlab GUI MP3 design [including Matlab source code 425]
[Voice processing] Speech enhancement algorithm based on human ear masking effect Noise ratio calculation [Including Matlab source code 428]
[Speech denoising] Based on matlab spectral subtraction denoising [Including Matlab source code 429]
[Speech recognition] BP neural network speech recognition based on the momentum item of matlab [Including Matlab source code 430]
[Voice steganography] based on matlab LSB voice hiding [including Matlab source code 431]
[Voice recognition] based on matlab male and female voice recognition [including Matlab source code 452]
[Voice processing] based on matlab voice noise adding and noise reduction processing [including Matlab source code Issue 473]
[Speech denoising] based on matlab least squares (LMS) adaptive filter [including Matlab source code 481]
[Speech enhancement] based on matlab spectral subtraction, least mean square and Wiener filter speech enhancement [including Matlab source code 482 period】
[Communication] based on matlab GUI digital frequency band (ASK, PSK, QAM) modulation simulation [including Matlab source code 483]
[Signal processing] based on matlab ECG signal processing [including Matlab source code 484]
[Voice broadcast] based on matlab voice Broadcast [Including Matlab source code 507]
[Signal processing] Matlab wavelet transform based on EEG signal feature extraction [Including Matlab source code 511]
[Voice processing] Based on matlab GUI dual tone multi-frequency (DTMF) signal detection [Including Matlab source code 512 】
【Voice steganography】based on matlab LSB to realize the digital watermark of speech signal 【Include Matlab source code 513】
【Speech enhancement】Speech recognition based on matlab matched filter 【Include Matlab source code 514】
【Speech processing】Based on matlab GUI voice Frequency domain spectrogram analysis [including Matlab source code 527]
[Speech denoising] based on matlab LMS, RLS algorithm voice denoising [including Matlab source code 528]
[Voice denoising] based on matlab LMS spectral subtraction voice denoising [including Matlab Source code issue 529]
[Voice denoising] based on matlab soft threshold, hard threshold, compromise threshold voice denoising [including Matlab source code 530]
[Voice recognition] based on matlab specific person's voice recognition discrimination [including Matlab source code 534]
[ Speech denoising] based on matlab wavelet soft threshold speech noise reduction [including Matlab source code 531]
[speech denoising] based on matlab wavelet hard threshold speech noise reduction [including Matlab source code 532]
[speech recognition] based on matlab MFCC and SVM specific Human gender recognition [including Matlab source code 533]
[Voice recognition] GMM speech recognition based on MFCC [including Matlab source code 535 period]
[Voice recognition] Based on matlab VQ specific person isolated words voice recognition [including Matlab source code 536 period]
[Voice recognition] based on matlab GUI voiceprint recognition [including Matlab] Source code issue 537]
[Acquisition and reading] based on matlab voice collection and reading [including Matlab source code 538]
[Voice editing] based on matlab voice editing [including Matlab source code 539]
[Voice model] based on matlab voice signal mathematical model [including Matlab source code 540]
[Speech soundness] based on matlab voice intensity and loudness [including Matlab source code 541]
[Emotion recognition] based on matlab K nearest neighbor classification algorithm voice emotion recognition [including Matlab source code 542]
[Emotion recognition] based on matlab Support vector machine (SVM) speech emotion recognition [including Matlab source code 543]
[Emotion recognition] Neural network-based speech emotion recognition [including Matlab source code 544]
[Sound source localization] Sound source localization based on matlab different spatial spectrum estimation Algorithm comparison [Include Matlab source code 545]
[Sound source localization] Based on matlab microphone receiving signal under different signal-to-noise ratio [Include Matlab source code 546]
[Sound source localization] Room impulse response based on matlab single sound source and dual microphones [ Contains Matlab source code 547]
[Sound source localization] Matlab generalized cross-correlation sound source location [Matlab source code 548 is included]
[Sound source location] Matlab array manifold matrix-based signal display [Matlab source code 549]
[Features Extraction] based on matlab formant estimation [including Matlab source code 550 period]
[Feature extraction] based on matlab pitch period estimation [including Matlab source code 551]
[Feature extraction] based on matlab voice endpoint detection [including Matlab source code 552]
[Voice coding] based on matlab ADPCM codec [including Matlab source code 553]
[Voice Encoding] based on matlab LPC encoding and decoding [including Matlab source code 554]
[Voice encoding] based on matlab PCM encoding and decoding [including Matlab source code 555]
[Speech analysis] Based on matlab cepstrum analysis and MFCC coefficient calculation [including Matlab source code 556]
[Speech analysis] based on matlab linear prediction coefficient comparison [including Matlab source code 557]
[speech analysis] based on matlab voice short-time frequency domain analysis [including Matlab source code 558]
[speech analysis] based on matlab voice short-time time domain analysis [including Matlab Source code issue 559]
[Speech analysis] based on matlab voice line spectrum pair conversion [including Matlab source code 560]
[speech synthesis] signal framing and restoration based on matlab proportional overlap and addition [including Matlab source code 561]
[Speech synthesis] Speech synthesis based on matlab linear prediction formant detection and pitch parameters [with Matlab source code 562]
[speech synthesis] based on matlab linear prediction coefficients and pitch parameters [with Matlab source code 563]
[speech synthesis] based on matlab linear prediction Coefficient and prediction error speech synthesis [Include Matlab source code 564]
[Speech synthesis] Matlab-based voice signal speed change [Include Matlab source code 565]
[Speech synthesis] Matlab voice signal-based tone change [Include Matlab source code 566]
[Speech synthesis] signal framing and restoration based on matlab overlap storage method [including Matlab source code 567]
[Speech synthesis] signal framing and restoration based on matlab overlap addition method [including Matlab source code 568]
[Voice denoising] Improved spectral subtraction speech denoising based on matlab [including Matlab source code 569]
[Voice denoising] Based on matlab basic Wiener filter algorithm speech denoising [including Matlab source code 570]
[Voice denoising] Based on matlab spectral subtraction voice denoising[ With Matlab source code 571]
[Speech denoising] Wiener filtering algorithm based on matlab prior SNR [Voice denoising with Matlab source code 572]

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/115003346