1. Commonly used tools and libraries

Overview

wave It is a standard module of Python. There are two common modules for processing audio data in Python:

librosa: Good at audio signal processing, using numpy to store data internally, reading and writing files depends on the soundfile module (mp3 is not supported)
pydub: Based on the underlying ffmpegread and write files, code simple, cutting support, format conversion, volume, ID3 and other common functions, low threshold. ( ffmpegIs an extremely powerful open source video processing software)

Usage suggestion: daily use pydub is sufficient, more powerful signal processing requires librosa, but there is a certain mathematical threshold, you need to understand the principle of signal processing, and master basic algorithms such as Fourier transform.

wave

waveIt is a standard module of Python , Python audio processing: wave recorded some brief introduction and source code

pydub

Pydub lets you do stuff to audio in a way that isn’t stupid

pydubIt allows us to process audio files, but Pydub only supports native wav格式file processing. So if you want to process audio in other formats, or you want to process audio in media files, you need to install FFmpegsupport locally .

Many simple and practical code samples are recorded in its official documentation: API Documentation

pydubThe AudioSegment() in is an unchangeable object, which contains a wealth of operation objects, such as volume increase and decrease, audio merging, audio reading duration, audio interception and other operations. Note that when processing multiple audio, first make sure that they have the same number of channels, frame rate, sample rate, bit depth, etc.

The following is a simple record of all basic usage:

AudioSegment(…).from_file(), open the audio and return an AudioSegment instance
AudioSegment(...).export(), write AudioSegment to a file.
AudioSegment.empty(), creates an empty (zero duration) AudioSegment.
AudioSegment.silent(), create a silent audio segment
AudioSegment.from_mono_audiosegments(), fusion of multiple single-channel audio, note that each single-channel audio needs to have the same length until the start of the frame timing.
AudioSegment(...).dBFS, returns the sound loudness measured in dBFS (db relative to the maximum possible loudness).
AudioSegment(…).rms, returns the loudness measured by rms
AudioSegment(…).channels
AudioSegment(…).sample_width
AudioSegment(…).frame_rate
AudioSegment(…).frame_width
AudioSegment(…).max, returns the maximum amplitude in audio
AudioSegment(…).max_dBFS, returns the maximum amplitude in the audio in dBFS
AudioSegment(…).duration_seconds, returns the audio duration, the unit is ms (milliseconds)
AudioSegment(…).raw_data, returns the original audio
AudioSegment(…).frame_count(), returns the number of frames
AudioSegment(…).append(), merge audio, for example combined = sound1.append(sound2), the default crossfade=100ms
AudioSegment(...).overlay(), overlay it with an audio, but the extra part will be cut off. (To be studied)
AudioSegment(...).apply_gain(gain), change the amplitude (generally, loudness), where the unit of Gain is dB.
AudioSegment(...).fade(), a more general (more flexible) fade method, you can set parameters, such as start and end points or start and duration.
AudioSegment(…).fade_out(), fade out
AudioSegment(…).fade_in(), fade in
AudioSegment(…).reverse(), to generate a copy of the reverse playback
AudioSegment(…).set_frame_rate(), set the specified frame rate, the unit is Hz
AudioSegment(…).set_channels()
AudioSegment(…).split_to_mono(), split the channel
AudioSegment(…).apply_gain_stereo(), add gain to the left and right channels of the stereo (gain)
AudioSegment(…).pan(), set the pan parameter, such as -1.0 (100% left) and +1.0 (100% right)
AudioSegment(...).get_array_of_samples(), returns the original audio data as an array of (numerical) samples
AudioSegment(...).get_dc_offset(), returns a value between -1.0 and 1.0, indicating the DC offset of the channel.
AudioSegment(...).remove_dc_offset(), remove the DC offset from the channel
AudioSegment(...).invert_phase(), copy this audio segment and invert the phase of the signal.

There are still some documents that have not been added:

Insert picture description here

Some usages summarized by others are also recorded below for reference.

# -*- coding: utf-8 -*-
# @Author  : FELIX
# @Date    : 2018/5/18 15:13

from pydub import AudioSegment

sound=AudioSegment.from_file("aaa.mp3","mp3")
sound2=AudioSegment.from_file('bbb.mp3','mp3')
# 把一个多声道音频分解成两个单声道
# index[0]为左声道
# index[1]为右声道
# sounds=sound.split_to_mono()
# print(sounds)


# 将两个单声道合并成多声道
# stereo_sound = AudioSegment.from_mono_audiosegments(sounds[0], sounds[1])



# # 取得音频的分贝数
# loudness = sound.dBFS
# print(loudness)
# # 获取音频音量大小，该值通常用来计算分贝数（dB= 20×lgX）
# loudness = sound.rms
# print(loudness)
# # 取得音频的声道数
# channel_count = sound.channels
# print(channel_count)
# # 取得音频文件采样宽度
# bytes_per_sample = sound.sample_width
# print(bytes_per_sample)
#
# # 取得音频文件采样频率
# frames_per_second = sound.frame_rate
# print(frames_per_second)
# #取得音频文件帧宽度
# bytes_per_frame = sound.frame_width
# print(bytes_per_frame)
#
# #取得音频中的最大振幅
# normalized_sound = sound.apply_gain(-sound.max_dBFS)
# print(normalized_sound)
# #取得音频的持续时间，同 len()
# print(sound.duration_seconds)
# print((len(sound) / 1000.0))
# #取得音频数据
# raw_audio_data = sound.raw_data
# # print(raw_audio_data)
# #取得音频的frame数量
# number_of_frames_in_sound = sound.frame_count()
# number_of_frames_in_200ms_of_sound = sound.frame_count(ms=200)
# print(number_of_frames_in_sound)
# print(number_of_frames_in_200ms_of_sound)

# 拼接sound1与sound2，返回一个新的AudioSegment实例
# cossfade：交叉渐变间隔 ms
# no_crossfade1 = sound.append(sound2, crossfade=5000)
# print(no_crossfade1)
# no_crossfade1.export(r'cc.wav',format='wav') # 输出

# 把sound2覆盖在sound1上，两个音频文件会叠加，如果sound2较长，则会被截断。
# 参数：
# position：覆盖起始位置（毫秒）
# loop：是否循环覆盖（true/false）
# times：重复覆盖次数（默认1）
# gain_during_overlay：调整被覆盖音频的音量（eg，-6.0）
# played_togther = sound.overlay(sound2)
# # sound2_starts_after_delay = sound.overlay(sound2, position=5000)
# # volume_of_sound1_reduced_during_overlay = sound.overlay(sound2, gain_during_overlay=-8)
# # sound2_repeats_until_sound1_ends = sound.overlay(sound2, loop=True)
# # sound2_plays_twice = sound.overlay(sound2, times=2)
# played_togther.export(r'dd.wav',format='wav') # 输出



#调整音量大小
# louder_via_method = sound.apply_gain(+3.5) # 提高
# quieter_via_method = sound.apply_gain(-5.7) # 减小


#淡出
# 参数：
# to_gain：淡出结束时音频音量下降到的分贝数
# from_gain：设置淡出前的所有音频分贝数
# start：淡出的起始位置
# end：淡出的结束位置
# duration：淡出持续时间
# fade_in_the_hard_way = sound.fade(from_gain=-120.0, start=0, duration=5000)
# fade_out_the_hard_way = sound.fade(to_gain=-120.0, end=0, duration=5000)

# 反向输出
# sound.reverse().export(r'ee.wav',format='wav') # 输出

# 调整多声道音频的左右声道音量
# 如果单声道音频调用此方法，它将先被转换为多声道
# stereo_balance_adjusted = sound.apply_gain_stereo(-6, +2)
#
# #左右声道平衡，按百分比增大一边，减小另一边
# # pan the sound 15% to the right
# panned_right = sound.pan(+0.15)
# # pan the sound 50% to the left
# panned_left = sound.pan(-0.50)
#
#
# # 基于DSP的渲染
# # 产生一个反向信号的副本，来消除反相位波，或者降低噪音
# sound.invert_phase()

Use Python to do audio processing·General chapter (continuous update)

Python audio processing

1. Commonly used tools and libraries

Overview

wave

pydub

books

ffmpeg

pyaudio (and this, dig a hole first)

2. Other common processing summary

Guess you like