[Voice recognition] based on matlab VQ specific person isolated words speech recognition [including Matlab source code 536]

1. Introduction

VQ (Vector Quantization) is a commonly used compression technology. This article mainly reviews:

1) VQ principle

2) VQ-based speaker recognition (SR, speaker recognition) technology
〇. Classification problem

Speaker recognition is actually a classification problem:
Insert picture description here
Speaker recognition technology mainly includes these methods:

Template matching methods
These methods are relatively mature, and the main principles are: feature extraction, template training, and matching. Typical ones are: dynamic time warping DTW, vector quantization VQ, etc.

DTW uses the idea of ​​dynamic programming, but it also has shortcomings: 1) Too much reliance on VAD technology; 2) It does not make full use of the timing dynamic characteristics of speech, so it is easy to understand that it is replaced by HMM.

The VQ algorithm is a method of data compression. Codebook resume and codeword search are two basic problems. Codebook resume is to train a better codebook from a large number of signal samples. Codeword search is to find a codeword that best matches the input. The method is simple and suitable for small System and sound with obvious differences are more suitable.

Classification methods based on statistical models.
The nature of this type of method is still a pattern recognition system. It needs to extract features, then train the classifier, and finally classify decision-making. Typical framework:
Insert picture description here
Commonly used models are: GMM, HMM, SVM, ANN, DNN or various Joint model, etc.

GMM basic framework:
Insert picture description here
similar to the GMM-UBM (Universal background model) algorithm. The difference from GMM is that it trains a large GMM on the overall sample of the L class, instead of training a GMM model for each class like GMM. In the case of SVM, MFCC is used as a feature, and each frame is used as a sample. VAD can be used to delete invalid audio segments and directly train the classification. In recent years, there have also been methods of using sparse expression:
Insert picture description here
2 Principle
Vector Quantization This technology is widely used in signal processing and data compression and other fields. In fact, there is a VQ step in multimedia compression formats such as JPEG and MPEG-4.

The name Vector Quantization sounds a bit mysterious, but in fact it is not so sophisticated. Everyone knows that analog signals are continuous values, and computers can only process discrete digital signals. When converting analog signals into digital signals, we can replace an interval with a certain value in the interval, for example, [ All values ​​on 0, 1) become 0, all values ​​on [1, 2) become 1, and so on. This is a VQ process. A more formal definition is: VQ is a process of encoding a point in a vector space with a finite subset of it.

A typical example is the encoding of images. In the simplest case, consider a grayscale image, 0 is black, 1 is white, and the value of each pixel is a real number on [0, 1]. Now to encode it as a 256-level grayscale image, one of the simplest methods is to map each pixel value x to an integer floor(x 255). Of course, the original data space does not necessarily have to be continuous. For example, if you want to compress this picture, each pixel uses only 4 bits (instead of the original 8 bits) to store. Therefore, you need to use [0, 15] for the integer values ​​in the original [0, 255] interval. ] On the integer value to encode, a simple mapping scheme is x 15/255.

However, this mapping scheme is quite Naive. Although it can reduce the number of colors to achieve a compression effect, if the original colors are not evenly distributed, the quality of the resulting picture may not be very good. For example, if a 256-level grayscale image is completely composed of two colors, 0 and 13, then a completely black image will be obtained through the above mapping, because both colors are mapped to 0. A better approach is to combine clustering to select representative points.

The actual method is: treat each pixel as a piece of data, run K-means to get k centroids, and then use the pixel values ​​of these centroids to replace the pixel values ​​of all points in the corresponding cluster. For color pictures, the same method can be used. For example, in RGB three-color pictures, each pixel is regarded as a point in a 3-dimensional vector space.

Second, the source code

17 / 18
% Demo script that generates all graphics in the report and demonstrates our results.
[s6 fs6] = wavread('s6.wav');
[s1 fs1] = wavread('s1.wav');
%Question 2
disp('> Question 2:画出原始语音波形');
t = 0:1/fs1:(length(s1) - 1)/fs1;
plot(t, s1), axis([0, (length(s1) - 1)/fs1 -0.4 0.5]);
title('原始语音s1的波形');
xlabel('时间/s');
ylabel('幅度')
pause 
close all
%Question 3 (linear)
disp('> Question 3: 画出线性谱');
M = 100;%当前帧数
N = 256;%帧长
frames = blockFrames(s1, fs1, M, N);%分帧
t = N / 2;
tm = length(s1) / fs1;
subplot(121);
imagesc([0 tm], [0 fs1/2], abs(frames(1:t, :)).^2), axis xy;
title('能量谱(M = 100, N = 256)');
xlabel('时间/s');
ylabel('频率/Hz');
colorbar;
%Question 3 (logarithmic)
disp('> Question 3: 画出对数谱');
subplot(122);
imagesc([0 tm], [0 fs1/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy;
title('对数能量谱(M = 100, N = 256)');
xlabel('时间/s');
ylabel('频率/Hz');
colorbar;
D=get(gcf,'Position');
set(gcf,'Position',round([D(1)*.5 D(2)*.5 D(3)*2 D(4)*1.3]))
pause
close all
%Question 4
disp('> Question 4: 画出不同帧长语谱图');
lN = [128 256 512];
u=220;
for i = 1:length(lN)
    N = lN(i);
    M = round(N / 3);
    frames = blockFrames(s1, fs1, M, N);
    t = N / 2;
    temp = size(frames);
    nbframes = temp(2);
    u=u+1;
    subplot(u)
    imagesc([0 tm], [0 fs1/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy;
    title(sprintf('能量对数谱(第 = %i帧, 帧长 = %i, 帧数 = %i)', M, N, nbframes));
    xlabel('时间/s');
    ylabel('频率/Hz');
    colorbar
end
D=get(gcf,'Position');
set(gcf,'Position',round([D(1)*.5 D(2)*.5 D(3)*1.5 D(4)*1.5]))
pause
close all
%Question 5
disp('> Question 5: Mel空间');
plot(linspace(0, (fs1/2), 129), (melfb(20, 256, fs1))');
title('Mel滤波');
xlabel('频率/Hz');
pause
close all
%Question 6
disp('> Question 6: 修正谱');
M = 100;
N = 256;
frames = blockFrames(s1, fs1, M, N);
n2 = 1 + floor(N / 2);
m = melfb(20, N, fs1);
z = m * abs(frames(1:n2, :)).^2;
t = N / 2;
tm = length(s1) / fs1;
subplot(121)
imagesc([0 tm], [0 fs1/2], abs(frames(1:n2, :)).^2), axis xy;
title('原始能量谱');
xlabel('时间/s');
ylabel('频率/Hz');
colorbar;
subplot(122)
imagesc([0 tm], [0 20], z), axis xy;
title('通过mel倒谱修正后的能量谱');
xlabel('时间/s');
ylabel('滤波器数目');
colorbar;
D=get(gcf,'Position');
set(gcf,'Position',[0 D(2) D(3)*2 D(4)])
pause
close all
%Question 7
disp('> Question 7: 2D plot of accustic vectors');
c1 = mfcc(s1, fs1);
c2 = mfcc(s2, fs2);
plot(c1(5, :), c1(6, :), 'or');
hold on;
plot(c2(5, :), c2(6, :), 'xb');
xlabel('5th Dimension');
ylabel('6th Dimension');
legend('说话人1', '说话人2');
title('2D plot of accoustic vectors');
pause
close all
%Question 8
disp('> Question 8: 画出已训练好的VQ码本')
d1 = vqlbg(c1,16);
d2 = vqlbg(c2,16);
plot(c1(5, :), c1(6, :), 'xr')
hold on

Three, running results

Insert picture description here

Four, remarks

Complete code or writing add QQ 1564658423 past review
>>>>>>
[Feature extraction] Audio watermark embedding and extraction based on matlab wavelet transform [Include Matlab source code 053]
[Speech processing] Voice signal processing based on matlab GUI [Include Matlab Source code issue 290]
[Voice acquisition] based on matlab GUI voice signal collection [including Matlab source code 291]
[Voice modulation] based on matlab GUI voice amplitude modulation [including Matlab source code 292]
[Speech synthesis] based on matlab GUI voice synthesis [including Matlab Source code issue 293]
[Voice encryption] Voice signal encryption and decryption based on matlab GUI [With Matlab source code 295]
[Speech enhancement] Matlab wavelet transform-based voice enhancement [Matlab source code 296]
[Voice recognition] Based on matlab GUI voice base frequency Recognition [Including Matlab source code 294]
[Speech enhancement] Matlab GUI Wiener filtering based voice enhancement [Including Matlab source code 298]
[Speech processing] Based on matlab GUI voice signal processing [Including Matlab source code 299]
[Signal processing] Based on Matlab speech signal spectrum analyzer [including Matlab source code 325]
[Modulation signal] Digital modulation signal simulation based on matlab GUI [including Matlab source code 336]
[Emotion recognition] Voice emotion recognition based on matlab BP neural network [including Matlab source code 349 Issue]
[Voice Steganography] Quantified Audio Digital Watermarking Based on Matlab Wavelet Transform [Include Matlab Source Code Issue 351]
[Feature extraction] based on matlab audio watermark embedding and extraction [including Matlab source code 350 period]
[speech denoising] based on matlab low pass and adaptive filter denoising [including Matlab source code 352 period]
[emotion recognition] based on matlab GUI voice emotion classification Recognition [Including Matlab source code 354 period]
[Basic processing] Matlab-based speech signal preprocessing [Including Matlab source code 364 period]
[Speech recognition] Matlab Fourier transform 0-9 digital speech recognition [Including Matlab source code 384 period]
[Speech Recognition] 0-9 digital speech recognition based on matlab GUI DTW [including Matlab source code 385]
[Voice playback] Matlab GUI MP3 design [including Matlab source code 425]
[Voice processing] Speech enhancement algorithm based on human ear masking effect Noise ratio calculation [Including Matlab source code 428]
[Speech denoising] Based on matlab spectral subtraction denoising [Including Matlab source code 429]
[Speech recognition] BP neural network speech recognition based on the momentum item of matlab [Including Matlab source code 430]
[Voice steganography] based on matlab LSB voice hiding [including Matlab source code 431]
[Voice recognition] based on matlab male and female voice recognition [including Matlab source code 452]
[Voice processing] based on matlab voice noise adding and noise reduction processing [including Matlab source code Issue 473]
[Speech denoising] based on matlab least squares (LMS) adaptive filter [including Matlab source code 481]
[Speech enhancement] based on matlab spectral subtraction, least mean square and Wiener filter speech enhancement [including Matlab source code 482 period】
[Communication] based on matlab GUI digital frequency band (ASK, PSK, QAM) modulation simulation [including Matlab source code 483]
[Signal processing] based on matlab ECG signal processing [including Matlab source code 484]
[Voice broadcast] based on matlab voice Broadcast [Including Matlab source code 507]
[Signal processing] Matlab wavelet transform based on EEG signal feature extraction [Including Matlab source code 511]
[Voice processing] Based on matlab GUI dual tone multi-frequency (DTMF) signal detection [Including Matlab source code 512 】
【Voice steganography】based on matlab LSB to realize the digital watermark of speech signal 【Include Matlab source code 513】
【Speech enhancement】Speech recognition based on matlab matched filter 【Include Matlab source code 514】
【Speech processing】Based on matlab GUI voice Frequency domain spectrogram analysis [including Matlab source code 527]
[Speech denoising] based on matlab LMS, RLS algorithm voice denoising [including Matlab source code 528]
[Voice denoising] based on matlab LMS spectral subtraction voice denoising [including Matlab Source code issue 529]
[Voice denoising] based on matlab soft threshold, hard threshold, compromise threshold voice denoising [including Matlab source code 530]
[Voice recognition] based on matlab specific person's voice recognition discrimination [including Matlab source code 534]
[ Speech denoising] based on matlab wavelet soft threshold speech noise reduction [including Matlab source code 531]
[speech denoising] based on matlab wavelet hard threshold speech noise reduction [including Matlab source code 532]
[speech recognition] based on matlab MFCC and SVM specific Human gender recognition [including Matlab source code 533]
[Speech recognition] GMM speech recognition based on MFCC [including Matlab source code 535 period]

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/114877155