LIBROSA realizes basic audio processing + principle analysis of three elements of music

Vector representation of sound

principle

  1. The vector x ∈ RN x\in R^Nx∈RN represents the audio signal on the time interval , xi x_ixi​ represents the sound pressure when t = hit=h_it=hi​ xi = α p (hi), i = 1,.. ., N x_i=\alpha p(h_i),i=1,...,Nxi​=αp(hi​),i=1,...,N
  2. Each xi x_ixi​ is called a sample
  3. h (>0) is the sampling time
  4. 1/h is the sampling rate , the typical sampling rate is 1 / h = 44100 / sec 1/h=44100/sec1/h=44100/sec or 48000 / sec 48000/sec48000/sec
  5. α \alphaα is called the ratio factor

Use python's librosa library to read audio signals and display waveforms with matplotlib

y,sr = librosa.load("MUSIC STEM.wav",sr=None)	#y为长度等于采样率sr*时间的音频向量
plt.figure()
librosa.display.waveplot(y, sr)	#创建波形图
plt.show()	#显示波形图
  • 1
  • 2
  • 3
  • 4

result

Insert picture description here

analysis

Each bit of the audio signal corresponds to a sampling point, the number of bits is equal to "sampling rate * audio duration (in seconds)", which records the audio amplitude information on each sample

Zoom audio signal

principle

The loudness of the audio is determined by the absolute value of each sample of the audio signal, so the algebraic multiplication of the audio signal can increase or decrease the loudness of the audio

The algebraic multiplication function provided by the numpy array can be used to increase or decrease the audio loudness, and use the librosa library to write the audio vector back to the file

y=2*y	#增加一倍振幅
y=0.5*y	#减小振幅为原来一半
y=-y	#翻转振幅
y=10*y	#大幅增加振幅
librosa.output.write_wav(dir,y,sr)	#将音频向量写回文件
  • 1
  • 2
  • 3
  • 4
  • 5

result

The waveforms of the four operation results are shown in the figure

Insert picture description here

  1. When y=2*y, the obtained audio loudness is slightly larger than the original audio
  2. When y=0.5*y, the obtained audio loudness is slightly smaller than the original audio
  3. When y=-y, the obtained audio loudness is consistent with the original audio
  4. When y=10*y, the obtained audio loudness is much greater than the original audio

analysis

  1. The value of each sample of the audio signal reflects the distance of the vibration from the equilibrium point, that is , the amplitude. The greater the amplitude , the greater the energy of the sound, and the louder the sound sounds.
  2. Multiply the sample value. If the absolute value of the multiplication factor is greater than 1, the sound loudness will increase, and if the absolute value of the multiplication factor is less than 1, the sound loudness will decrease.
  3. Since the distance of vibration from the equilibrium point is an absolute value, the effect when the multiplication coefficient is negative is the same as when the multiplication coefficient is the opposite number.

Linear combination and mixing

principle

  1. Perform linear operations on multiple audio signals y = a 1 x 1 + a 2 x 2 +... + Akxky=a_1x_1+a_2x_2+...+a_kx_ky=a1​x1​+a2​x2​+...+ak ​Xk​can achieve mixing

  2. At this time, each audio signal xk x_kxk​ is called an audio track

  3. The mixed result y is called mixed

  4. Each coefficient ak a_kak​ is the weight of the audio track in the mix

The linear operation function provided by the numpy array can be used to achieve mixing

Try again to mix the vocal and accompaniment. Take a vocal track and an accompaniment track, and assign weights (0.25, 0.75), (0.5, 0.5), (0.4, 0.6) to linear mixing.

y=0.25*x1+0.75*x2
y=0.5*x1+0.5*x2
y=0.6*x1+0.4*x2
  • 1
  • 2
  • 3

result

The waveform diagram is as follows:
Insert picture description here

The resulting audio is the effect of adding vocals to the accompaniment

analysis

  1. Audio is linearly added, and the resulting waveform graph cannot be visually distinguished between human voice and accompaniment, but because the harmonic components of each frequency in the human voice and accompaniment are different from those in the accompaniment, linear addition is assigned to each frequency harmonic in the frequency domain. So it can be clearly distinguished on the spectrogram

  2. In order to get clear enough mixed audio, you need to constantly change the weight of each mixed audio track to get the most suitable mixing mode

expand

Although the accompaniment and vocal substantially different harmonic components, but adding on the same frequency can easily lead to the occurrence of the collision frequency, so that the mixing of vocal and accompaniment indistinguishable, so for two-channel audio, we often use the bias of Method, set different weights for the vocal and accompaniment in the two channels (usually one channel is louder, the other channel is louder)

A numpy array of shape (2,n) can be used to accommodate two-channel audio signals. The two channels are mixed with weights (0.4, 0.6) and (0.6, 0.4) respectively, and the waveforms are as follows:
Insert picture description here

Music

  1. For the sound signal p(t), if p (t + T) ≈ p (t) p(t+T)\approx p(t)p(t+T)≈p(t), the period T takes the value Between 0.0005 seconds and 0.01 seconds, p is regarded as a musical tone
  2. The length of the period (determines the frequency and) determines the pitch
  3. The characteristics of the waveform in each cycle determine the timbre
  4. The energy of the musical sound determines the loudness , that is, the sound pressure

pitch

principle

  1. f = 440 H zf=440Hz f=440Hz is the center A
  2. One octave doubled in frequency
  3. In the twelve equal temperament, each semitone is turned 2 1/12 2^{1/12}21/12 times in frequency
  4. The distance of each semitone is the distance between the two keys on the black and white keys. From do to ascending do, the difference between the white keys is full and half full and full, and the semitones in each full tone are divided into Black key
  5. The distance between any two sounds is called the interval. The unit of interval is degree . The interval between the same single sound is 1 degree. After that, every difference in sound level increases by 1 degree.

Experimental steps:

  1. Set the fundamental frequency f to 440Hz, and use numpy to generate a sine function at 440Hz
  2. Set a sampling point every 1/44100s, and generate 1s 440Hz sine wave every 1s
  3. On this basis, at the nth generation, the frequency of 1s will be 440 ⋅ 2 (n − 1) / 12 H z 440·2^{(n-1)/12}Hz440⋅2(n−1)/12Hz The sine wave is linearly added to the original sine wave to perform a chord, thereby simulating all intervals from a minor second to a pure octave
x=np.empty((0,))	#生成空数组
for i in range(0,13):
    x1=np.linspace(0,1, num=44100, endpoint=True, dtype=float)	#生成采样点
    x1=5*np.sin(2*np.pi*440*np.power(2,i/12)*x1)+5*np.sin(2*np.pi*440*x1)	#生成波形与和弦
    x2=np.zeros((44100,))	#留空部分
    x=np.concatenate((x,x1,x2),axis=0)	#连接数组
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

result

There are 13 chords of different intervals in the generated 26s audio. Some chords sound harmonious, and some chords sound dissonant.

analysis

  1. Minor second intervals (a difference of one semitone) and major seventh intervals (a difference of five whole steps and one semitone) are extremely inconsistent intervals
  2. Major second intervals (a difference of one whole step), minor seventh intervals (a difference of five whole steps), and three whole steps (three whole steps) are inconsistent intervals
  3. Minor thirds (a difference of one whole step and one semitone), major thirds (a difference of two whole steps), minor sixths (a difference of four whole steps), and major sixths (a difference of four whole steps and one semitone) are incomplete consonant intervals
  4. A pure fourth interval (a difference of two whole steps and a semitone) and a pure fifth interval (a difference of three whole steps and a semitone) are complete consonant intervals
  5. A pure octave (a difference of six whole tones) is a very complete consonant interval

Timbre

  1. For periodic signals p (t) = ∑ k = 1 K (akcos (2 π fkt) + bksin (2 π fkt)) p(t)=\displaystyle \sum_(k=1)^K (a_kcos(2\ pi fkt)+b_ksin(2\pi fkt))p(t)=k=1∑K​(ak​cos(2πfkt)+bk​sin(2πfkt)), each decomposed signal is harmonic or overtone

  2. f is the frequency

  3. a and b are harmonic coefficients

  4. When K is large enough, any periodic signal can be transformed into this form through integration

  5. The ratio of harmonic amplitude determines the tone. For each harmonic whose frequency is determined by k, its harmonic amplitude is ck = ak 2 + bk 2 c_k=\sqrt{a_k^2+b_k^2}ck​=ak2 ​+bk2​​, so the harmonic amplitude can form a vector of length k c = (0.3, 0.4,...) C=(0.3,0.4,...)c=(0.3,0.4,... ), enough harmonics can be mixed with different amplitude values ​​to form different timbres

Generate sine and cosine signals in 1, 5, 10, 20, and 50 frequency bands between 220-11000 Hz, and randomly generate harmonic coefficients

result

The mixed waveform diagram under five conditions is as follows:

Insert picture description here
Get an audio file that plays 1s audio with different timbres every 1s

analysis

  1. Use different enough harmonics to mix with different harmonic coefficients to form different timbre audio
  2. In reality, known musical instruments can also use spectrum analysis to extract timbre features for computer reproduction
  3. Various electronic synthesizers can be made using the principle of harmonics for composers to use

Guess you like

Origin blog.csdn.net/c2a2o2/article/details/111352750