语音信号前端处理代码解析

版权声明:本文为博主原创文章,转载请加入原文链接,谢谢。。 https://blog.csdn.net/shawncheer/article/details/85636189

参考:https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

代码如下:

import numpy
import scipy.io.wavfile
from scipy.fftpack import dct

sample_rate, signal = scipy.io.wavfile.read('OSR_us_000_0010_8k.wav')  # File assumed to be in the same directory
signal = signal[0:int(3.5 * sample_rate)]  # Keep the first 3.5 seconds

pre_emphasis = 0.97
emphasized_signal = numpy.append(signal[0], signal[1:] - pre_emphasis * signal[:-1])

frame_size = 0.025
frame_stride = 0.01

frame_length, frame_step = frame_size * sample_rate, frame_stride * sample_rate  # Convert from seconds to samples

signal_length = len(emphasized_signal)
frame_length = int(round(frame_length))
frame_step = int(round(frame_step))
num_frames = int(numpy.ceil(float(numpy.abs(signal_length - frame_length)) / frame_step))  # Make sure that we have at least 1 frame

pad_signal_length = num_frames * frame_step + frame_length
z = numpy.zeros((pad_signal_length - signal_length))
pad_signal = numpy.append(emphasized_signal, z) # Pad Signal to make sure that all frames have equal number of samples without truncating any samples from the original signal

indices = numpy.tile(numpy.arange(0, frame_length), (num_frames, 1)) + numpy.tile(numpy.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T
x = numpy.tile(numpy.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T
y = numpy.tile(numpy.arange(0, frame_length), (num_frames, 1))
print num_frames
print x.shape
print y.shape
frames = pad_signal[indices.astype(numpy.int32, copy=False)]

其中,有一行计算每个窗口起点位置的代码非常重要。

indices = numpy.tile(numpy.arange(0, frame_length), (num_frames, 1)) + numpy.tile(numpy.arange(0, num_frames * frame_step, frame_step), (frame_length, 1)).T

猜你喜欢

转载自blog.csdn.net/shawncheer/article/details/85636189