Speech signal processing - basic concepts (1): audio length (s), sampling rate (Hz; such as 16000), frame length (25ms), frame number, frame shift (12.5ms), hop_size (number of sample points moved per frame = 16000*12.5/1000=200)

Need to understand, mel frame number * frame shift = audio length (the number of sampling points can be converted into audio duration, how to do it needless to say)

Therefore, for 22050 sampling rate, the hopsize size is set to 256, then the corresponding mel-spectrogram needs to be upsampled by 256 times

What if it is 16000 sampling rate? The frame length is 50ms, and the frame shift is 12.5ms, then the hop_size is 200 (16000*12.5/1000=200), so the upsampling multiple is 200 times.

1. Sampling rate (sampling frequency): the number of samples per second

The number of samples to take per second. The symbol is fS and the unit is Hz. The higher the sampling rate, the closer the shape of the digital waveform is to the original analog waveform, and the more realistic the sound reproduction will be.

According to the Nyquist–Shannon sampling theorem , only when the sampling frequency is twice as high as the highest frequency in the original analog signal can the original signal be perfectly restored . Commonly used sampling rates are shown in the figure below

insert image description here

Two, frame length

Three, frame shift

Four, hop_size

5. nb_samples

nb_samples indicates the number (number) of samples in a frame of audio data, nb_sample

Guess you like

Origin blog.csdn.net/u013250861/article/details/126594126