Spectrum, Phase Spectrum, Magnitude Spectrum, Power Spectrum and Spectrogram of Basic Knowledge of Speech Signal Processing

1. Speech signal processing

An audio signal can be represented by a real vector in the time domain. The size of this array = sample rate * audio duration. For example: a piece of audio with a sampling rate of 8000 and a length of 15.6s is expressed in matlab as: a real vector of size 15.6x8000=124800

insert image description here

Here are two ways to read .wav files and .pcm files using matlab

1.1 read wav

[x1, fs_n] = audioread('test\Far_common.wav');   

1.2 read pcm

fid_far = fopen("test\Far_common.pcm",'r'); 
x_far = fread(fid_far,inf,'int16');

insert image description here

As can be seen from the above figure, the audio signal is represented by a vector in matlab.

2. Basic knowledge of speech signal processing

2.1, spectrogram

Concept: Indicates the relationship between signal frequency and energy. The spectrogram generally consists of two parts: the phase spectrogram and the magnitude spectrogram.

Drawing method: Perform Fourier transform on a piece of time-domain audio, and the result is a spectrogram. But since it contains two pieces of information, it cannot be drawn directly. It can be plotted separately as a phase spectrogram and a magnitude spectrogram.

[x1, fs_n] = audioread('D:\blog\新建文件夹\Far_common.wav');   
xi_fd = fft(x1);

From the observation results, it can be found that a complex vector of the same size is obtained after a one-dimensional vector is Fourier transformed.

insert image description here

2.1.1 Phase spectrogram

Concept: In Fourier analysis, the change of the phase of each component with frequency becomes the phase spectrum of the signal.

Drawing method: replace the amplitude part in the spectrum with the phase angle, here use the angle function in matlab to draw.

[x1, fs_n] = audioread('test\Far_common.wav');   

%频域信息
x1_fd = fft(x1);

%求相位谱
n=0:length(x1)-1;
f=n*fs_n/length(x1);
x1_abs = abs(x1_fd);
ph= 2*angle(x1_fd(1:length(x1)/2));
ph= ph *180/pi;
plot(f(1:length(x1)/2),ph(1:length(x1)/2));
xlabel('频率/hz'),ylabel('相角'),title('相位谱');

insert image description here

Here, due to the problem of selecting audio, the phase spectrum is too dense.

2.1.2 Amplitude spectrogram

Concept: In Fourier analysis, the variation of the amplitude of each component with frequency becomes the amplitude spectrum of the signal.

Drawing method: Perform modulo operation on the complex vector after fft of the signal, and the obtained amplitude spectrum is obtained.

%求幅度谱
x1_fd_abs = abs(x1_fd);
plot(f(1:length(x1)/2),x1_fd_abs(1:length(x1)/2));
xlabel('频率/hz'),ylabel('幅度'),title('幅度谱');

insert image description here

2.2, power spectrum (energy spectrum):

Concept: The power spectrum is the abbreviation of the power spectral density function, which is defined as the signal power in the unit frequency band. It represents the variation of signal power with frequency, that is, the distribution of signal power in the frequency domain.

Drawing method: Perform the sum of the squares of the real part and the imaginary part on the complex vector after the fft of the signal, and the power spectrum is obtained. (i.e. the square of the magnitude spectrum)

%求功率谱
for i=1:length(x1_fd)
    x1_power(i)=power(real(x1_fd(i)),2)+power(imag(x1_fd(i)),2);
end

insert image description here

2.3, spectrogram:

Concept: The abscissa of the spectrogram is time, the ordinate is frequency, and the coordinate point value is the energy of speech data. Since a two-dimensional plane is used to express three-dimensional information, the energy value is represented by color, and the darker the color, the stronger the speech energy at that point. It can be understood as using two-dimensional coordinates to represent three-dimensional information.

Drawing method: Here, the enframe function under the Voicebox resource package in matlab is used.

%求语谱图
clear all; clc; close all;
[x,Fs]=audioread('test\Far_common.wav');   %读入数据文件
wlen=800; inc=80; win=hanning(wlen);% 设置帧长,帧移和窗函数
N=length(x); time=(0:N-1)/Fs;       % 计算时间
y=enframe(x,win,inc)';              % 分帧
fn=size(y,2);                       % 帧数
frameTime=(((1:fn)-1)*inc+wlen/2)/Fs; % 计算每帧对应的时间
W2=wlen/2+1; n2=1:W2;
freq=(n2-1)*Fs/wlen;                % 计算FFT后的频率刻度
Y=fft(y);                           % 短时傅里叶变换
clf                                 % 初始化图形
% 画出语谱图        
set(gcf,'Position',[20 100 600 500]);            
axes('Position',[0.1 0.1 0.85 0.5]);  
imagesc(frameTime,freq,abs(Y(n2,:))); % 画出Y的图像  
axis xy; ylabel('频率/Hz');xlabel('时间/s');
title('语谱图');
% 画出语音信号的波形  
axes('Position',[0.07 0.72 0.9 0.22]);
plot(time,x,'k');
xlim([0 max(time)]);
xlabel('时间/s'); ylabel('幅值');
title('语音信号波形');

insert image description here

The above is my summary of the basic knowledge of amplitude spectrum, phase spectrum and energy spectrum encountered in the process of audio signal processing and analysis. If there is any error, please correct me.

Guess you like

Origin blog.csdn.net/qq_44085437/article/details/125376660