speechbrain(一)MFCC特征提取

流程图: 

​​​​​​​

MFCC的提取过程:

声波----->DFT------->幅值图------->filter bank-------->log-------->DCT------>MFCC

speechbrain.processing.features下的类和函数

 

STFT:计算短时傅里叶变换

spectral_magnitude:返回复频谱图的幅值

Filterbank:计算filter bank特征

Deltas:计算delta系数(时间的导数)

ContextWindow:计算上下文,窗口大小由left_frames, right_frames参数指定

InputNormalization:数值归一化(减均值,除方差),avg_factor设置统计数据和累计统计数据之间的权重因子

from speechbrain.dataio.dataio import read_audio
from speechbrain.processing.features import STFT, spectral_magnitude, Filterbank, DCT, InputNormalization,ContextWindow, Deltas

signal =read_audio('samples/audio_samples/example1.wav')
signal = torch.tensor(signal).unsqueeze(0)
compute_stft = STFT(sample_rate=16000, win_length=25, n_fft=400,                     
                            window_fn=torch.hamming_window)
features = compute_stft(signal)
features = spectral_magnitude(features)
compute_fbank = Filterbank(n_mels=40, log_mel=True)
features = compute_fbank(features)
compute_mfcc = DCT(input_size=40, n_out=20)
features = compute_mfcc(features)
compute_deltas = Deltas(input_size=20)
deltas1 = compute_deltas(features)
deltas2 = compute_deltas(features)
features = torch.cat([features, deltas1, deltas2], dim=2)
compute_cw = ContextWindow(left_frames=3, right_frames=3)
features = compute_cw(features)
norm = InputNormalization()
features = norm(features, avg_factor = torch.tensor([1]).float())

speechbrain官网:SpeechBrain — SpeechBrain 0.5.0 documentationhttps://speechbrain.readthedocs.io/en/latest/index.html

猜你喜欢

转载自blog.csdn.net/qq_55796594/article/details/122229476