[Voice analysis] based on matlab voice short-time frequency domain analysis [including Matlab source code 558]

1. Introduction

Principle
1
Short Time Fourier Transform Short Time Fourier Analysis (Short Time Fourier Analysis, STFA) is suitable for analyzing the spectrum analysis of slow time-varying signals, and has been widely used in speech analysis and processing. The method is to divide the speech signal into frames first, and then perform Fourier transform on each frame. Each frame of speech signal can be considered to be cut out from various different stationary signal waveforms, and the short-time frequency spectrum of each frame of speech is an approximation of the frequency spectrum of each stationary signal waveform.

Since the speech signal is short-time stable, the speech can be divided into frames and the Fourier transform of a certain frame can be calculated, so that the short-time Fourier transform is obtained. It is defined as
Insert picture description here

③The shorter the window length, the higher the time resolution, but the frequency resolution will decrease accordingly. For example, using a short window can clearly observe the changes of formants in different pitch periods, but the fine structure of the fundamental frequency and harmonics disappears on the short-term spectrogram.
④ Due to the contradictory relationship between time resolution and frequency resolution, when performing short-time Fourier transform, the window length should be selected as a compromise based on the purpose of analysis.

2 Representation and realization method of
spectrogram The spectrograph is to send the electrical signal of the voice to a group of narrowband filters connected in sequence in frequency, and the output of each narrowband filter is rectified and the mean square is in the order of frequency from low to high. Record on a roll of recording paper. The strength of the signal is represented by the gray scale recorded on the paper. If the signal output by a certain filter is strong, the corresponding record will be darker; otherwise, it will be lighter. The recording paper rotates at a certain speed, which is equivalent to recording the corresponding filter output at different times. The resulting graph is the spectrogram of the speech signal. The horizontal direction is the time axis and the vertical direction is the frequency axis. The gray stripes on the graph represent the short-time spectrum of speech at each moment. The spectrogram reflects the dynamic spectrum characteristics of the speech signal and is called visual speech.

Second, the source code

clear all;
clc; 
close all;

[x,fs]=audioread('audio.wav');       % 读入数据文件
wlen=256;
nfft=wlen;
win=hanning(wlen);
inc=128;          % 给出帧长和帧移

y=STFFT(x,win,nfft,inc);        %求短时傅里叶变换

fn=size(y,2);                           %帧数

freq=(0:wlen/2)*fs/wlen;                % 计算FFT后的频率刻度

frameTime=FrameTimeC(fn,wlen,inc,fs); % 计算每帧对应的时间
imagesc(frameTime,freq,20*log10(abs(y)+eps)); % 画出Y的图像  
axis xy; ylabel('频率/Hz');xlabel('时间/s');
title('能量谱图');
function frameout=enframe(x,win,inc)

nx=length(x(:));            % 取数据长度
nwin=length(win);           % 取窗长
if (nwin == 1)              % 判断窗长是否为1，若为1，即表示没有设窗函数
   len = win;               % 是，帧长=win
else
   len = nwin;              % 否，帧长=窗长
end
if (nargin < 3)             % 如果只有两个参数，设帧inc=帧长
   inc = len;
end
nf = fix((nx-len+inc)/inc); % 计算帧数
frameout=zeros(nf,len);            % 初始化
indf= inc*(0:(nf-1)).';     % 设置每帧在x中的位移量位置
inds = (1:len);             % 每帧数据对应1:len

Three, running results

Insert picture description here

Four, remarks