librosa's short-time Fourier implementation librosa.stft()

Table of contents

1. API parameter analysis

2. Framing mechanism

2.1 C language implementation

2.1.1 Test code

2.2 Implementation of short-time Fourier transform of librosa

2.2.1 The framing mechanism of libsora.stft()

2.2.2 Test code

2.2.3 Analysis of running results

2.3 Implementation of short-time inverse Fourier transform of librosa

3. Appendix: librosa official website


1. API parameter analysis

function prototype

librosa.stft((y, n_fft=2048, hop_length=None, win_length=None, \

window='hann', center=True, pad_mode='reflect'))

function function

Framing, windowing, and calculation of short-time Fourier transform (FFT) for speech

parameter

(1) y: The input audio time series (usually obtained by librosa.load())

(2) n_fft : The number of FFT points, the number of fft points and the window length can be different, but generally set to the same

(3) hop_length : frame shift

(4) win_length: refers to the window length of the sliding window [truncated window]. Generally speaking, let these two be equal win_length=n_fft

(5) window : Specifies the window function in the form of a string. If not specified, the default is the Hanning window

(6) center : bool type, the default value is True, generally use the default value. Refers to "center alignment". This is related to the mechanism of framing and is the parameter we need to focus on.

(7) pad_mode : Padding mode setting, use the default parameters, which means that the frame data is filled with data 0.

return value

Returns a complex number matrix (a real number sequence is a complex number sequence obtained after framing, windowing, and Fourier transform, in the form of a+bi).

definition (source file)

declaration (header file)

2. Framing mechanism

Libsora's short-time Fourier transform, the frame is a bit "unconventional". Take the Fourier transform of 256 sampling points as an example.

Experimental data: In order to conduct experiments more intuitively, concisely and conveniently and understand API parameters and implementation mechanisms, 256 sampling points are read from the audio file in advance and stored in an array or list. Similarly, the Hanning window can also be calculated and stored, and directly look up the table when using it, which can avoid the amount of calculation. This is a common practice of "sacrificing space for time efficiency" in algorithm processing.

2.1 C language implementation

Directly perform Fourier transform on these 256 points, the number of points of Fourier transform is 256, and the order is 8. The complex FFT calculation algorithm is used. For the input: the real part is the audio data, and the imaginary part is all 0.

2.1.1 Test code

#define FRAME_LEN 256
#define FFT_ORDER ((int)log2(FRAME_LEN))
float dat_i[FRAME_LEN], dat_a[FRAME_LEN];
float dat_r[FRAME_LEN]={
	0.000214,
	0.000122,
	-0.000061,
	0.000275,
	0.000092,
	0.000336,
	0.000153,
	-0.000092,
	0.000031,
	0.000305,
	0.000641,
	0.000458,
	0.000305,
	0.000153,
	0.000397,
	0.000732,
	0.000885,
	0.000580,
	0.000427,
	0.000397,
	0.000641,
	0.001190,
	0.001282,
	0.000977,
	0.000336,
	0.000458,
	0.001007,
	0.001617,
	0.001465,
	0.000824,
	0.000275,
	0.000366,
	0.001007,
	0.001282,
	0.000885,
	0.000122,
	-0.000336,
	-0.000122,
	0.000336,
	0.000275,
	-0.000153,
	-0.000824,
	-0.000916,
	-0.000580,
	-0.000519,
	-0.000763,
	-0.001099,
	-0.001221,
	-0.001007,
	-0.000580,
	-0.000793,
	-0.000885,
	-0.001038,
	-0.000580,
	-0.000275,
	-0.000214,
	-0.000549,
	-0.000610,
	-0.000061,
	0.000702,
	0.001190,
	0.000763,
	0.000305,
	0.000275,
	0.001068,
	0.001923,
	0.001953,
	0.001190,
	0.000488,
	0.000519,
	0.001251,
	0.001740,
	0.001282,
	0.000397,
	-0.000183,
	0.000000,
	0.000214,
	0.000305,
	-0.000092,
	-0.000458,
	-0.000732,
	-0.000610,
	-0.000671,
	-0.000916,
	-0.000946,
	-0.000824,
	-0.000702,
	-0.001068,
	-0.000977,
	-0.001190,
	-0.000671,
	-0.000244,
	-0.000122,
	-0.000580,
	-0.000854,
	-0.000519,
	0.000153,
	0.000641,
	0.000671,
	0.000214,
	-0.000122,
	0.000336,
	0.000732,
	0.000702,
	0.000397,
	0.000122,
	0.000122,
	0.000275,
	0.000000,
	-0.000458,
	-0.000580,
	-0.000275,
	0.000092,
	-0.000244,
	-0.001373,
	-0.001892,
	-0.001465,
	-0.000153,
	0.000214,
	-0.000641,
	-0.002533,
	-0.002808,
	-0.001526,
	0.000549,
	0.000946,
	-0.000824,
	-0.002594,
	-0.002625,
	-0.000427,
	0.001678,
	0.001587,
	-0.000397,
	-0.002075,
	-0.001526,
	0.000549,
	0.002014,
	0.001740,
	-0.000122,
	-0.001678,
	-0.000916,
	0.000610,
	0.001556,
	0.000977,
	-0.000580,
	-0.001251,
	-0.000854,
	-0.000183,
	0.000214,
	-0.000031,
	-0.000671,
	-0.001007,
	-0.001221,
	-0.000916,
	-0.000824,
	-0.000427,
	-0.000549,
	-0.000732,
	-0.000610,
	-0.000854,
	-0.000671,
	-0.000458,
	-0.000183,
	0.000031,
	0.000275,
	0.000427,
	-0.000092,
	-0.000397,
	0.000000,
	0.000793,
	0.001526,
	0.001465,
	0.000122,
	-0.001068,
	-0.000854,
	0.000946,
	0.002472,
	0.002014,
	-0.000488,
	-0.002350,
	-0.002075,
	0.000702,
	0.002686,
	0.001801,
	-0.001526,
	-0.003662,
	-0.002472,
	0.000793,
	0.002747,
	0.001740,
	-0.001251,
	-0.002991,
	-0.001648,
	0.001160,
	0.003143,
	0.002289,
	0.000244,
	-0.000916,
	-0.000305,
	0.001526,
	0.002960,
	0.002808,
	0.002289,
	0.001221,
	0.000092,
	0.000427,
	0.001404,
	0.003021,
	0.003479,
	0.001709,
	-0.001068,
	-0.002502,
	-0.001007,
	0.001831,
	0.002960,
	0.000427,
	-0.003510,
	-0.005157,
	-0.002930,
	0.000580,
	0.001648,
	-0.001038,
	-0.004486,
	-0.005463,
	-0.002991,
	-0.000122,
	0.000885,
	-0.000488,
	-0.002045,
	-0.002289,
	-0.001129,
	0.000214,
	0.001343,
	0.002441,
	0.003448,
	0.002991,
	0.001282,
	0.000153,
	0.002014,
	0.006134,
	0.008820,
	0.006439,
	0.000793,
	-0.002625,
	0.000488,
	0.007111,
	0.010132,
	0.004669,
	-0.004791,
	-0.009033,
	-0.004028,
	0.004089,
	0.006104,
	-0.002228,
	-0.013214,
	-0.016327,
	-0.009583,
};

int main(void)
{
	for (int i=0; i<FRAME_LEN; i++) {
		dat_i[i]=0.0;
		dat_r[i]=dat_r[i]*hanning[i];
	}

	FFT(dat_r, dat_i, dat_a, FRAME_LEN, FFT_ORDER);
//	for (int i=0; i<FRAME_LEN; i++) {
//		printf("[%d] (%f+%fj)\n", i, dat_r[i], dat_i[i]);
//		printf("%d %f\n", i, dat_r[i]);
//	}
#if 1
	FILE* fp_dbg=NULL;
	fp_dbg = fopen("res_fft.txt", "wb");
	for (int i=0; i<FRAME_LEN; i++) {
		fprintf(fp_dbg, "[%d]:%f\n", i, dat_r[i]);
	}
	fclose(fp_dbg);
	printf("data output done\n");
#endif
	return 0;
}

2.1.2 Analysis of running results

[0] (-0.016482+0.000000j) // DC component

[1] (0.023277+0.005782j)

[2] (-0.022870+-0.002839j)

[3] (0.008954+-0.001403j)

[4] (0.009356+-0.005058j)

[5] (0.003737+0.002048j)

[6] (-0.016804+0.000179j)

[7] (-0.005669+0.038893j)

[8] (0.003740+-0.042136j)

[9] (0.001611+0.004286j)

[10] (0.001527+0.002011j)

[11] (0.000186+0.000361j)

[12] (0.000990+-0.000215j)

[13] (0.000373+-0.000256j)

[14] (-0.001273+0.000464d)

[15] (0.001159+0.001058j)

[16] (0.000621+-0.001287j)

[17] (-0.000673+0.001001j)

[18] (0.000025+-0.000228j)

[19] (-0.000244+-0.000012j)

[20] (0.000698+0.000222j)

[21] (-0.000256+-0.000116j)

[22] (-0.000189+-0.000112j)

[23] (0.000449+0.000111j)

[24] (-0.000790+0.000218j)

[25] (0.000185+0.000019j)

[26] (0.001250+-0.000179j)

[27] (-0.000985+-0.000131j)

[28] (-0.000077+0.000429j)

[29] (0.000429+-0.000058j)

[30] (-0.000620+-0.000187j)

[31] (0.000907+0.000340j)

[32] (-0.000126+-0.000314j)

[33] (0.000024+-0.000142j)

[34] (-0.000503+0.000532j)

[35] (0.001116+0.000584j)

[36] (0.000258+-0.000051j)

[37] (0.000807+0.000773j)

[38] (0.003857+0.003912j)

[39] (-0.022648+0.023224j)

[40] (-0.000071+-0.056523j)

[41] (0.021825+0.018984j)

[42] (0.006530+0.003042j)

[43] (-0.019067+0.025972j)

[44] (0.002945+-0.045785j)

[45] (0.010391+0.040381j)

[46] (-0.018414+-0.019886j)

[47] (0.010679+0.007567j)

[48] (0.014649+-0.022715j)

[49] (-0.007632+0.028023j)

[50] (-0.004731+-0.005551j)

[51] (0.000074+-0.000718j)

[52] (-0.000791+-0.000904j)

[53] (0.000014+-0.000019j)

[54] (0.000344+0.000090j)

[55] (0.000134+-0.000584j)

[56] (-0.000264+0.000585j)

[57] (-0.000724+-0.000490j)

[58] (0.001307+-0.000154j)

[59] (-0.001079+0.000291j)

[60] (0.000548+-0.000334j)

[61] (-0.000573+0.000041j)

[62] (0.000840+-0.000249j)

[63] (-0.000619+0.000665j)

[64] (-0.000148+-0.000633j)

[65] (0.000158+0.000457j)

[66] (0.000138+-0.000554j)

[67] (0.000081+0.000580j)

[68] (-0.000484+-0.000325j)

[69] (0.000552+-0.000122j)

[70] (-0.000070+0.000219j)

[71] (-0.000362+-0.000100j)

[72] (0.000310+-0.000062j)

[73] (0.000635+0.000270j)

[74] (-0.001663+-0.000394j)

[75] (0.001318+0.000141j)

[76] (-0.000177+0.000009j)

[77] (-0.000646+-0.000014j)

[78] (0.000624+0.000356j)

[79] (-0.000049+-0.000515j)

[80] (-0.000366+-0.000084j)

[81] (0.000828+0.000682j)

[82] (-0.001138+-0.000252j)

[83] (0.000686+-0.000317j)

[84] (-0.000067+0.000474j)

[85] (-0.000097+-0.000849j)

[86] (0.000413+0.001005j)

[87] (-0.000761+-0.000557j)

[88] (0.000237+0.000295j)

[89] (0.000096+-0.000673j)

[90] (0.000482+0.000729j)

[91] (-0.000311+-0.000147j)

[92] (-0.000463+-0.000385j)

[93] (0.000049+0.000497j)

[94] (0.000472+-0.000785j)

[95] (0.000007+0.001097j)

[96] (0.000009+-0.000468j)

[97] (-0.000771+0.000036j)

[98] (0.000657+-0.000847j)

[99] (0.000456+0.001364j)

[100] (-0.001066+-0.000980j)

[101] (0.000792+0.000747j)

[102] (-0.000536+-0.000571j)

[103] (0.000696+-0.000112j)

[104] (-0.000424+0.000539j)

[105] (-0.000023+-0.000251j)

[106] (0.000105+-0.000071j)

[107] (-0.000341+0.000353j)

[108] (0.000462+-0.001015j)

[109] (-0.000282+0.001091j)

[110] (0.000073+-0.000221j)

[111] (0.000250+-0.000085j)

[112] (-0.000258+-0.000122j)

[113] (-0.000355+-0.000028j)

[114] (0.001097+0.000032j)

[115] (-0.001402+0.000154j)

[116] (0.001382+0.000325j)

[117] (-0.001009+-0.001084j)

[118] (0.000305+0.000141j)

[119] (-0.000007+0.001139j)

[120] (0.000040+-0.000427j)

[121] (-0.000253+-0.000659j)

[122] (0.000403+0.000886j)

[123] (-0.000065+-0.000793j)

[124] (-0.000628+0.000475j)

[125] (0.001202+-0.000259j)

[126] (-0.000712+0.000480j)

[127] (0.000087+-0.000369j)

[128] (-0.000109+0.000000j) //直流分量

[129] (0.000087+0.000369j)

[130] (-0.000712+-0.000480j)

[131] (0.001202+0.000259j)

[132] (-0.000628+-0.000475j)

[133] (-0.000065+0.000793j)

[134] (0.000403+-0.000886j)

[135] (-0.000253+0.000659j)

[136] (0.000040+0.000427j)

[137] (-0.000007+-0.001139j)

[138] (0.000305+-0.000141j)

[139] (-0.001009+0.001084j)

[140] (0.001382+-0.000325j)

[141] (-0.001402+-0.000154j)

[142] (0.001097+-0.000032j)

[143] (-0.000355+0.000028j)

[144] (-0.000258+0.000122j)

[145] (0.000251+0.000085j)

[146] (0.000073+0.000221j)

[147] (-0.000282+-0.001091j)

[148] (0.000462+0.001015j)

[149] (-0.000341+-0.000353j)

[150] (0.000105+0.000071j)

[151] (-0.000023+0.000251j)

[152] (-0.000424+-0.000539j)

[153] (0.000696+0.000112j)

[154] (-0.000536+0.000571j)

[155] (0.000792+-0.000747j)

[156] (-0.001066+0.000980j)

[157] (0.000456+-0.001364j)

[158] (0.000657+0.000847j)

[159] (-0.000771+-0.000036j)

[160] (0.000009+0.000468j)

[161] (0.000007+-0.001097j)

[162] (0.000472+0.000785j)

[163] (0.000049+-0.000497j)

[164] (-0.000463+0.000385j)

[165] (-0.000311+0.000147j)

[166] (0.000482+-0.000729j)

[167] (0.000096+0.000673j)

[168] (0.000237+-0.000295j)

[169] (-0.000761+0.000557j)

[170] (0.000413+-0.001005j)

[171] (-0.000097+0.000849j)

[172] (-0.000067+-0.000474j)

[173] (0.000686+0.000317j)

[174] (-0.001138+0.000252j)

[175] (0.000828+-0.000682j)

[176] (-0.000366+0.000084j)

[177] (-0.000049+0.000515j)

[178] (0.000624+-0.000356j)

[179] (-0.000646+0.000014j)

[180] (-0.000177+-0.000009j)

[181] (0.001318+-0.000141j)

[182] (-0.001663+0.000394j)

[183] (0.000635+-0.000270j)

[184] (0.000310+0.000062j)

[185] (-0.000362+0.000100j)

[186] (-0.000070+-0.000219j)

[187] (0.000552+0.000122j)

[188] (-0.000484+0.000325j)

[189] (0.000081+-0.000580j)

[190] (0.000138+0.000554j)

[191] (0.000158+-0.000457j)

[192] (-0.000148+0.000633j)

[193] (-0.000619+-0.000665j)

[194] (0.000840+0.000249j)

[195] (-0.000573+-0.000041j)

[196] (0.000548+0.000334j)

[197] (-0.001079+-0.000291j)

[198] (0.001307+0.000154j)

[199] (-0.000724+0.000490j)

[200] (-0.000264+-0.000585j)

[201] (0.000134+0.000584j)

[202] (0.000344+-0.000090j)

[203] (0.000014+0.000019j)

[204] (-0.000791+0.000904j)

[205] (0.000074+0.000718j)

[206] (-0.004731+0.005551j)

[207] (-0.007631+-0.028023j)

[208] (0.014648+0.022715j)

[209] (0.010679+-0.007567j)

[210] (-0.018414+0.019886j)

[211] (0.010392+-0.040381j)

[212] (0.002945+0.045785j)

[213] (-0.019067+-0.025972j)

[214] (0.006530+-0.003042j)

[215] (0.021826+-0.018984j)

[216] (-0.000071+0.056523j)

[217] (-0.022648+-0.023224j)

[218] (0.003857+-0.003912j)

[219] (0.000807+-0.000773j)

[220] (0.000258+0.000051j)

[221] (0.001116+-0.000584j)

[222] (-0.000503+-0.000532j)

[223] (0.000024+0.000142j)

[224] (-0.000126+0.000314j)

[225] (0.000907+-0.000340j)

[226] (-0.000620+0.000187j)

[227] (0.000429+0.000058j)

[228] (-0.000077+-0.000429j)

[229] (-0.000985+0.000131j)

[230] (0.001250+0.000179j)

[231] (0.000185+-0.000019j)

[232] (-0.000790+-0.000218j)

[233] (0.000449+-0.000111j)

[234] (-0.000189+0.000112j)

[235] (-0.000256+0.000116j)

[236] (0.000698+-0.000222j)

[237] (-0.000244+0.000012j)

[238] (0.000025+0.000228j)

[239] (-0.000673+-0.001001j)

[240] (0.000621+0.001286j)

[241] (0.001159+-0.001058j)

[242] (-0.001273+-0.000464j)

[243] (0.000373+0.000256j)

[244] (0.000990+0.000215j)

[245] (0.000186+-0.000361j)

[246] (0.001527+-0.002011j)

[247] (0.001611+-0.004286j)

[248] (0.003740+0.042136j)

[249] (-0.005668+-0.038893j)

[250] (-0.016804+-0.000179j)

[251] (0.003737+-0.002048j)

[252] (0.009356+0.005058j)

[253] (0.008954+0.001403j)

[254] (-0.022870+0.002839j)

[255] (0.023277+-0.005782j)

注:

(1)实数FFT的运算结果得到复数序列

(2)第0个和第FRAME_LEN/2个是直流分量,其余的结果具有“共轭对称性”,即DFT的共轭对称性:X(m)=X^*(N-m)

(3)注意运算结果的前129个数据,要与下文的libsora.stft()的结果做对比

根据实数的FFT结果具有“共轭对称性”的性质,可以优化FFT计算算法,将大大减小运算量和存储空间,一般来说像Matlab或者Python都有实数FFT计算的API,比如做256个点的FFT,只输出129个点的数据即可(没必要返回冗余数据),。为方便理解,这里采用的是复数的FFT算法。测试代码中,个人实现的API简单说明如下:

/*

float dataR[]:时域上,输入序列的实部;计算结束,变成频域上频点的实部

float dataR[]:时域上,输入序列的虚部(都是0);计算结束,变成频域上频点的虚部

float dataA[]:运算结果序列的幅值。

int N:FFT序列的长度

int M:FFT序列的阶数

*/

void FFT(float dataR[], float dataI[], float dataA[], int N, int M);

2.2 librosa的短时傅里叶变正换实现

用相同的数据源,由于librosa.stft()默认加的是汉宁窗,这里就没必要先进行计算。

2.2.1 libsora.stft()的分帧机制

(1) librosa.stft输出的帧数比正常计算的帧数多一帧,它分帧策略的准则是中心对齐的,即分帧点位于该帧的中心,首尾帧之外都有半个帧长度的padding。而padding部分使用0数据填充。

(2) 输出帧数的计算公式:signal_length/hop_length + 1

例:一帧是256个点,帧移取一半,也就是128,那么折叠的部分就是256-128=128。

正常的分帧方法可分为两个帧,第二帧的后半部分用0填充,或者一般来说,最后一帧可以直接舍弃。如下图:

 而librosa的做法是会在数据的头部和尾部增加半个帧长度的填充”,再进行分帧,如下图:

输入的数据:data=dat1+dat2

第一帧数据:128个0+dat1

第二帧数据:data=dat1+dat2

第三帧数据:dat2+128个0

每次的傅里叶变换都是256个点的输入。

2.2.2 测试代码

文件:libsora_test.py

import librosa
from librosa.core.spectrum import amplitude_to_db
import numpy as np
import soundfile as sf
import matplotlib.pyplot as plt

dat_r = np.array([
0.000214,
0.000122,
-0.000061,
0.000275,
0.000092,
0.000336,
0.000153,
-0.000092,
0.000031,
0.000305,
0.000641,
0.000458,
0.000305,
0.000153,
0.000397,
0.000732,
0.000885,
0.000580,
0.000427,
0.000397,
0.000641,
0.001190,
0.001282,
0.000977,
0.000336,
0.000458,
0.001007,
0.001617,
0.001465,
0.000824,
0.000275,
0.000366,
0.001007,
0.001282,
0.000885,
0.000122,
-0.000336,
-0.000122,
0.000336,
0.000275,
-0.000153,
-0.000824,
-0.000916,
-0.000580,
-0.000519,
-0.000763,
-0.001099,
-0.001221,
-0.001007,
-0.000580,
-0.000793,
-0.000885,
-0.001038,
-0.000580,
-0.000275,
-0.000214,
-0.000549,
-0.000610,
-0.000061,
0.000702,
0.001190,
0.000763,
0.000305,
0.000275,
0.001068,
0.001923,
0.001953,
0.001190,
0.000488,
0.000519,
0.001251,
0.001740,
0.001282,
0.000397,
-0.000183,
0.000000,
0.000214,
0.000305,
-0.000092,
-0.000458,
-0.000732,
-0.000610,
-0.000671,
-0.000916,
-0.000946,
-0.000824,
-0.000702,
-0.001068,
-0.000977,
-0.001190,
-0.000671,
-0.000244,
-0.000122,
-0.000580,
-0.000854,
-0.000519,
0.000153,
0.000641,
0.000671,
0.000214,
-0.000122,
0.000336,
0.000732,
0.000702,
0.000397,
0.000122,
0.000122,
0.000275,
0.000000,
-0.000458,
-0.000580,
-0.000275,
0.000092,
-0.000244,
-0.001373,
-0.001892,
-0.001465,
-0.000153,
0.000214,
-0.000641,
-0.002533,
-0.002808,
-0.001526,
0.000549,
0.000946,
-0.000824,
-0.002594,
-0.002625,
-0.000427,
0.001678,
0.001587,
-0.000397,
-0.002075,
-0.001526,
0.000549,
0.002014,
0.001740,
-0.000122,
-0.001678,
-0.000916,
0.000610,
0.001556,
0.000977,
-0.000580,
-0.001251,
-0.000854,
-0.000183,
0.000214,
-0.000031,
-0.000671,
-0.001007,
-0.001221,
-0.000916,
-0.000824,
-0.000427,
-0.000549,
-0.000732,
-0.000610,
-0.000854,
-0.000671,
-0.000458,
-0.000183,
0.000031,
0.000275,
0.000427,
-0.000092,
-0.000397,
0.000000,
0.000793,
0.001526,
0.001465,
0.000122,
-0.001068,
-0.000854,
0.000946,
0.002472,
0.002014,
-0.000488,
-0.002350,
-0.002075,
0.000702,
0.002686,
0.001801,
-0.001526,
-0.003662,
-0.002472,
0.000793,
0.002747,
0.001740,
-0.001251,
-0.002991,
-0.001648,
0.001160,
0.003143,
0.002289,
0.000244,
-0.000916,
-0.000305,
0.001526,
0.002960,
0.002808,
0.002289,
0.001221,
0.000092,
0.000427,
0.001404,
0.003021,
0.003479,
0.001709,
-0.001068,
-0.002502,
-0.001007,
0.001831,
0.002960,
0.000427,
-0.003510,
-0.005157,
-0.002930,
0.000580,
0.001648,
-0.001038,
-0.004486,
-0.005463,
-0.002991,
-0.000122,
0.000885,
-0.000488,
-0.002045,
-0.002289,
-0.001129,
0.000214,
0.001343,
0.002441,
0.003448,
0.002991,
0.001282,
0.000153,
0.002014,
0.006134,
0.008820,
0.006439,
0.000793,
-0.002625,
0.000488,
0.007111,
0.010132,
0.004669,
-0.004791,
-0.009033,
-0.004028,
0.004089,
0.006104,
-0.002228,
-0.013214,
-0.016327,
-0.009583,
])

res = librosa.stft(dat_r, n_fft=256, hop_length=128, win_length=256)
print("shape-res:", res.shape)  
print("row-res:", res.shape[0]) #行数
print("col-res:", res.shape[1]) #列数
#np.savetxt('dat_r_fft.txt', res, fmt="%f") #保存结果矩阵(复数矩阵,a+bi的形式)
#np.savetxt('dat_r_fft.txt', np.real(res), fmt="%f") #保存每一帧的实部
#np.savetxt('dat_r_fft.txt', np.imag(res), fmt="%f") #保存每一帧的虚部

lis0 = []
lis1 = []
lis2 = []
for i in range(res.shape[0]):
	lis0.append(np.real(res[i][0])) #取第一帧的实数部分(结果矩阵的第一列数据)
	lis1.append(np.real(res[i][1])) #取第二帧的实数部分(结果矩阵的第二列数据)
	lis2.append(np.real(res[i][2])) #取第三帧的实数部分(结果矩阵的第三列数据)
	
np.savetxt('res_fft_0.txt', lis0, fmt="%f")
np.savetxt('res_fft_1.txt', lis1, fmt="%f")
np.savetxt('res_fft_2.txt', lis2, fmt="%f")

2.2.3 运行结果分析

>>python librosa_test.py

shape-res: (129, 3)  #返回的矩阵大小:129行 x 3列,每一列就是一帧数据

#分成了三帧语音进行处理,每一帧是256个采样点,但是FFT的输

#出只保留了129个数据,没有输出冗余数据。--实数FFT的实现

row-res: 129

col-res: 3

为方便对比数据,将每一帧的运算结果的实数部分保存到txt文本中。

对比第二帧的结果,如下:可发现前129个数据近似相等[自己写的复数FFT算法有计算精度上的差异]。

 同样可以对比验证,第一和第三帧的FFT结果,从而证实librosa的分帧机制。

2.3 librosa的短时傅里叶逆变换实现

对应的API是:librosa.istft

参数一致即可。逆变换会根据正变换的规则,处理重叠部分,还原数据帧。测试代码如下:

res = librosa.stft(dat_r, n_fft=256, hop_length=128, win_length=256)

……

ires = librosa.istft(res, n_fft=256, hop_length=128, win_length=256)

np.savetxt('ires.txt', ires, fmt="%f")

对比结果,和输入序列数据是一致的。

3. 附录:librosa官网

Core IO and DSP — librosa 0.9.2 documentation

Guess you like

Origin blog.csdn.net/qq_40088639/article/details/126611146