Speech Signal Processing (1)

# Speech signal processing (1)
in depth study based speech enhancement and speech synthesis, speech pre-processing part is very important, in this case, their binding to make a complete set brief summary.

Pretreatment of speech

Before the speech signal analysis and processing, it must be pre-emphasis, framing, windowing preprocessing operations. The purpose of these operations is to eliminate the human vocal organ as the device itself and due to the speech signal caused by aliasing acquisition, harmonic distortion, high frequency, among other factors, affect the quality of the speech signal.

Pre-emphasis

Pre-emphasis, the aim of the high frequency portion of the speech is increased, to remove the effect of radiation lips, increasing the frequency resolution of the speech. Typically is accomplished by the transfer function is a first order FIR high-pass digital filter. Set of speech samples at time n value X (n), pre-emphasis process is a result of y (n) = x (n) -ax (n-1), wherein a is a pre-emphasis coefficient, typically 0.9 to 1.0, 0.98 usually taken.

It realized the complete set of pre-emphasis: `

def emphasis(signal_batch, emph_coeff=0.95, pre=True):
    """
    Pre-emphasis or De-emphasis of higher frequencies given a batch of signal.

    Args:
        signal_batch: batch of signals, represented as numpy arrays
        emph_coeff: emphasis coefficient
        pre: pre-emphasis or de-emphasis signals

    Returns:
        result: pre-emphasized or de-emphasized signal batch
    """
    result = np.zeros(signal_batch.shape)    #语音信号的batch_sized的shape用0填充
    for sample_idx, sample in enumerate(signal_batch): #enumerate()函数,同时列出数据和数据下标
        for ch, channel_data in enumerate(sample):#预加重公式y(n)=x(n)-ax(n-1)
            if pre:
                result[sample_idx][ch] = np.append(channel_data[0], channel_data[1:] - emph_coeff * channel_data[:-1])  #将channel_data[1:] - emph_coeff * channel_data[:-1]的值赋给channel_data[0],得到一个新的列表
            else:
                result[sample_idx][ch] = np.append(channel_data[0], channel_data[1:] + emph_coeff * channel_data[:-1])
    return result

Framing

Voice analysis throughout the entire process was "short-time analysis technology." A speech signal having varying characteristics, but within a short time frame (10 generally considered a short time of 30ms), the characteristics remained unchanged i.e. relatively stable, and therefore it can be considered as a quasi-steady state process, i.e., voice signals have short-term stability. So any voice signal processing and analysis must be based on the "short" basis, i.e., a "short-time analysis", a speech signal segment to analyze the characteristic parameters, where each segment is called a "frame", the frame length It is generally taken to be 10 30ms. Thus, in terms of the overall speech signal, it is analyzed by a time series of feature parameters for each frame characteristic parameters thereof.

def slice_signal(file, window_size, stride, sample_rate):
    """
    Helper function for slicing the audio file
    by window size and sample rate with [1-stride] percent overlap (default 50%).
    """
    wav, sr = librosa.load(file, sr=sample_rate)  #以16k进行采样,sr=None时默认为22050
    hop = int(window_size * stride)
    slices = []
    for end_idx in range(window_size, len(wav), hop):  #以百分之五十的帧移对语音切片
        start_idx = end_idx - window_size
        slice_sig = wav[start_idx:end_idx]
        slices.append(slice_sig)
    return slices

Windowing

Since the speech signal having a short time stationarity, we can divide the signal frame processing. Then also its windowing. The purpose of the window can be considered to be emphasized and weaken the rest of the waveform of the speech waveform in the vicinity of the sample n. Short segment of each speech signal processing, in fact, all of the short segment of a transformation or subjected to some kind of operation. The three most used window function is a rectangular window, Hamming window (the Hamming) and a Hanning window (Hanning), which are defined as:
Here Insert Picture Description

Reference blog: https: //blog.csdn.net/cwfjimogudan/article/details/71112171

Released two original articles · won praise 1 · views 67

Guess you like

Origin blog.csdn.net/weixin_43936357/article/details/103152325