[Feature extraction] based on matlab formant estimation [including Matlab source code 550 period]

1. Introduction

The formant refers to some areas where the energy is relatively concentrated in the sound spectrum. The formant is not only a determinant of sound quality, but also reflects the physical characteristics of the soundtrack (resonant cavity).

The peak position on the envelope curve of the vowel and consonant sound spectrum. The original meaning of formant refers to the resonance frequency of the acoustic cavity. In the production of vowels and consonants, the sound source spectrum is modulated by the acoustic cavity. The original harmonic amplitude no longer decreases with the increase in frequency, but some strengthen and some weaken, forming a new package with ups and downs. The frequency value at the peak of the curve is consistent with the resonance frequency of the acoustic cavity. As far as vowels are concerned, the first three formants have qualitative regulations for their timbre; the first two formants are particularly sensitive to the height of the tongue, and the acoustic vowel map is drawn based on the frequency values ​​of these two formants. of. The formant three-dimensional sonogram is shown as energy-concentrated horizontal bars.

Formant is an important feature reflecting the resonance characteristics of the vocal tract. It represents the most direct source of pronunciation information, and people use formant information in speech perception. Therefore, the formant is a very important characteristic parameter in speech signal processing, and it has been widely used as the main characteristic of speech recognition and the basic information of speech coding transmission. The formant information is contained in the frequency envelope. Therefore, the key to extracting formant parameters is to estimate the natural speech spectral envelope. Generally, the maximum value in the spectral envelope is the formant.

The principle of formant generation and its manifestation in sound quality

The distribution position of the formant is based on the resonance physical structure of the sound producing medium (Resonant Physical Structure).

Whether it is a human voice or a musical instrument, their sound characteristics are derived from two factors, one is the sound system, such as the human vocal cord or the vibrating reed of the musical instrument, and the other is the resonance system. The different resonance systems of the musical instrument make the amplitude of the partial notes in a certain frequency domain prominent. In this way, these areas produce resonance peaks unique to this musical instrument. These resonance peaks are closely related to the size and shape of the resonance body. Since the structure of a musical instrument is stable, all tones emitted by a musical instrument, regardless of the fundamental frequency, will show the same resonance peak, but its significance is strong or weak. This can help us explain why in many instruments, different tones produced by the same instrument have the same sound quality.

In speech acoustics, the human voice also has its own formant area affected by its own physiology such as the size of the nostrils, pharyngeal cavity, and oral cavity. By taking advantage of the different changes in the shape and size of these resonance spaces (such as changing the shape of the throat and mouth), we can change the formant of the sound. The reason why we can distinguish between different human voices and vowels is mainly based on the position of their formant distribution.

1 What is the role of formant and formant I have
  talked about the excitation model of speech before, when the glottal periodic pulse excitation signal passes through the vocal tract, it will cause resonance in the vocal tract (this process is called the vocal tract model in the speech production model ), resulting in a set of resonance frequencies. This set of resonance frequencies is called formants (frequency). It is generally considered that the several maximum values ​​in the speech spectrum envelope are formant frequencies. Accurate detection of formant frequency and bandwidth is helpful to distinguish different finals and improve the recognition of speech semantics.

2 Difficulties in the estimation of resonance peaks
(1) The existence of false resonance peaks.
(2) It is difficult to distinguish when the adjacent formant frequencies are relatively close to each other.
(3) It is difficult to extract high-pitched speech.
In short, as with pitch period estimation, there is currently no fully accurate estimation method.

3 Formant estimation pre-processing
(1) Pre-emphasis. The purpose is to remove the influence of the lip radiation and facilitate the analysis of the vocal tract response.
(2) Endpoint detection. The purpose is the same as the pitch period estimation, and there is no need to analyze the speech in the silent segment.

4 Method of resonant peak estimation
(1) Cepstrum method.
  Cepstrum processing for speech can separate the excitation signal from the vocal tract response, then remove the excitation signal, and then do Fourier transform to get the envelope of the vocal tract response, and find the maximum value on the envelope. It is the corresponding formant frequency.
(2) In the LPC method,
  we can obtain a set of prediction coefficients through linear prediction. According to this set of coefficients and the all-pole channel response model, perform FFT transformation to obtain the power spectrum of the channel transfer function, and then calculate the corresponding Maximum value, you can get the corresponding formant frequency.
(3) The HHT
  method hasn’t taken a closer look, and the specific steps are not yet known. It is mainly through empirical mode decomposition (EMD) and Hilbert transform to find the amplitude of the signal , Frequency, phase.

Second, the source code

%实倒谱法共振峰估计
clear all; clc; close all;

waveFile='C4_3_y.wav';               % 设置文件名
[x, fs, nbits]=wavread(waveFile);                 % 读入一帧数据
u=filter([1 -.99],1,x);                                   % 预加重
wlen=length(u);                                          % 帧长
cepstL=6;                                                   % 倒频率上窗函数的宽度
wlen2=wlen/2;               
freq=(0:wlen2-1)*fs/wlen;                          % 计算频域的频率刻度
u2=u.*hamming(wlen);		                      % 信号加窗函数
U=fft(u2);                                                 % 按式(4-26)计算
U_abs=log(abs(U(1:wlen2)));                     % 按式(4-27)计算
 [Val,Loc,spect]=Formant_Cepst(u2,cepstL);       % 计算出共振峰频率
FRMNT=freq(Loc);                                 % 计算出共振峰频率
subplot(211)
plot(freq,U_abs,'k'); 
xlabel('频率/Hz'); ylabel('幅值/dB');
title('(a)信号对数谱X\_i(k)')
axis([0 4000 -6 2]); grid;
subplot(212)
plot(freq,spect,'k','linewidth',2); 
%LPC内插法的共振峰估计
clear all; clc; close all;

fle='C4_3_y.wav';                            % 指定文件名
[x,fs]=wavread(fle);                        % 读入一帧语音信号 
u=filter([1 -.99],1,x);                     % 预加重
wlen=length(u);                             % 帧长
p=12;                                       % LPC阶数
freq=(0:256)*fs/512;                        % 频率刻度

[F,Bw,pp,U]=Formant_Interpolation(u,p,fs);          %LPC内插法求共振峰
plot(freq,U,'k');
title('声道传递函数功率谱曲线');
xlabel('频率/Hz'); ylabel('幅值');
ll=length(F);                             % 共振峰个数
for k=1 : ll
    line([F(k) F(k)],[0 pp(k)],'color','k','linestyle','-.');    
end
% LPC求根法的共振峰估计
 clc; close all;

fle='C4_3_y.wav';                            % 指定文件名
[xx,fs]=audioread(fle);                       % 读入一帧语音信号

u=filter([1 -.99],1,xx);                    % 预加重
wlen=length(u);                             % 帧长
p=12;                                       % LPC阶数
n_frmnt=4;                                  % 取四个共振峰
freq=(0:256)*fs/512;                        % 频率刻度
df=fs/512;                                  % 频率分辨率

[F,Bw,U]=Formant_Root(u,p,fs,n_frmnt);
plot(freq,U,'k');
title('声道传递函数功率谱曲线');
xlabel('频率/Hz'); ylabel('幅值/dB');
p1=length(F);                              % 在共振峰处画线
m=floor(F/df);
pp=U(m);                                    %共振峰幅度
for k=1 : p1
    line([F(k) F(k)],[-5 pp(k)],'color','k','linestyle','-.');
end


Three, running results

Insert picture description here
Insert picture description here
Insert picture description here

Four, remarks

Complete code or writing add QQ 1564658423 past review
>>>>>>
[Feature extraction] Audio watermark embedding and extraction based on matlab wavelet transform [Include Matlab source code 053]
[Speech processing] Voice signal processing based on matlab GUI [Include Matlab Source code issue 290]
[Voice acquisition] based on matlab GUI voice signal collection [including Matlab source code 291]
[Voice modulation] based on matlab GUI voice amplitude modulation [including Matlab source code 292]
[Speech synthesis] based on matlab GUI voice synthesis [including Matlab Source code issue 293]
[Voice encryption] Voice signal encryption and decryption based on matlab GUI [With Matlab source code 295]
[Speech enhancement] Matlab wavelet transform-based voice enhancement [Matlab source code 296]
[Voice recognition] Based on matlab GUI voice base frequency Recognition [Including Matlab source code 294]
[Speech enhancement] Matlab GUI Wiener filtering based voice enhancement [Including Matlab source code 298]
[Speech processing] Based on matlab GUI voice signal processing [Including Matlab source code 299]
[Signal processing] Based on Matlab speech signal spectrum analyzer [including Matlab source code 325]
[Modulation signal] Digital modulation signal simulation based on matlab GUI [including Matlab source code 336]
[Emotion recognition] Voice emotion recognition based on matlab BP neural network [including Matlab source code 349 Issue]
[Voice Steganography] Quantified Audio Digital Watermarking Based on Matlab Wavelet Transform [Include Matlab Source Code Issue 351]
[Feature extraction] based on matlab audio watermark embedding and extraction [including Matlab source code 350 period]
[speech denoising] based on matlab low pass and adaptive filter denoising [including Matlab source code 352 period]
[emotion recognition] based on matlab GUI voice emotion classification Recognition [Including Matlab source code 354 period]
[Basic processing] Matlab-based speech signal preprocessing [Including Matlab source code 364 period]
[Speech recognition] Matlab Fourier transform 0-9 digital speech recognition [Including Matlab source code 384 period]
[Speech Recognition] 0-9 digital speech recognition based on matlab GUI DTW [including Matlab source code 385]
[Voice playback] Matlab GUI MP3 design [including Matlab source code 425]
[Voice processing] Speech enhancement algorithm based on human ear masking effect Noise ratio calculation [Including Matlab source code 428]
[Speech denoising] Based on matlab spectral subtraction denoising [Including Matlab source code 429]
[Speech recognition] BP neural network speech recognition based on the momentum item of matlab [Including Matlab source code 430]
[Voice steganography] based on matlab LSB voice hiding [including Matlab source code 431]
[Voice recognition] based on matlab male and female voice recognition [including Matlab source code 452]
[Voice processing] based on matlab voice noise adding and noise reduction processing [including Matlab source code Issue 473]
[Speech denoising] based on matlab least squares (LMS) adaptive filter [including Matlab source code 481]
[Speech enhancement] based on matlab spectral subtraction, least mean square and Wiener filter speech enhancement [including Matlab source code 482 period】
[Communication] based on matlab GUI digital frequency band (ASK, PSK, QAM) modulation simulation [including Matlab source code 483]
[Signal processing] based on matlab ECG signal processing [including Matlab source code 484]
[Voice broadcast] based on matlab voice Broadcast [Including Matlab source code 507]
[Signal processing] Matlab wavelet transform based on EEG signal feature extraction [Including Matlab source code 511]
[Voice processing] Based on matlab GUI dual tone multi-frequency (DTMF) signal detection [Including Matlab source code 512 】
【Voice steganography】based on matlab LSB to realize the digital watermark of speech signal 【Include Matlab source code 513】
【Speech enhancement】Speech recognition based on matlab matched filter 【Include Matlab source code 514】
【Speech processing】Based on matlab GUI voice Frequency domain spectrogram analysis [including Matlab source code 527]
[Speech denoising] based on matlab LMS, RLS algorithm voice denoising [including Matlab source code 528]
[Voice denoising] based on matlab LMS spectral subtraction voice denoising [including Matlab Source code issue 529]
[Voice denoising] based on matlab soft threshold, hard threshold, compromise threshold voice denoising [including Matlab source code 530]
[Voice recognition] based on matlab specific person's voice recognition discrimination [including Matlab source code 534]
[ Speech denoising] based on matlab wavelet soft threshold speech noise reduction [including Matlab source code 531]
[speech denoising] based on matlab wavelet hard threshold speech noise reduction [including Matlab source code 532]
[speech recognition] based on matlab MFCC and SVM specific Human gender recognition [including Matlab source code 533]
[Voice recognition] GMM speech recognition based on MFCC [including Matlab source code 535 period]
[Voice recognition] Based on matlab VQ specific person isolated words voice recognition [including Matlab source code 536 period]
[Voice recognition] based on matlab GUI voiceprint recognition [including Matlab] Source code issue 537]
[Acquisition and reading] based on matlab voice collection and reading [including Matlab source code 538]
[Voice editing] based on matlab voice editing [including Matlab source code 539]
[Voice model] based on matlab voice signal mathematical model [including Matlab source code 540]
[Speech soundness] based on matlab voice intensity and loudness [including Matlab source code 541]
[Emotion recognition] based on matlab K nearest neighbor classification algorithm voice emotion recognition [including Matlab source code 542]
[Emotion recognition] based on matlab Support vector machine (SVM) speech emotion recognition [including Matlab source code 543]
[Emotion recognition] Neural network-based speech emotion recognition [including Matlab source code 544]
[Sound source localization] Sound source localization based on matlab different spatial spectrum estimation Algorithm comparison [Include Matlab source code 545]
[Sound source localization] Based on matlab microphone receiving signal under different signal-to-noise ratio [Include Matlab source code 546]
[Sound source localization] Room impulse response based on matlab single sound source and dual microphones [ Include Matlab source code 547]
[Sound source localization] Matlab generalized cross-correlation sound source location [Matlab source code 548 included]
[Sound source localization] Matlab array manifold matrix-based signal display [Matlab source code 549 included]

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/114972064
Recommended