This article is mainly about some script files and examples in the process of processing voice data. All codes only need to change the file path to be processed, the output path, etc., and all can be run.
Table of contents
Method 1: Cut a whole piece of audio into one audio piece by time in batches
Method 2: Batch a whole piece of audio into audio pieces based on sentence pauses
Method 3: Batch several entire audio clips in a folder into audio files one by one
Batch process pcm files into wav files
How to query the number of files in a folder under Linux
Detailed explanation of WAV format files
required environment
Environment for this article: Linux
pydub (installation: pip3 install pydub)
ffmpeg(apt install ffmpeg)
Method 1: Cut a whole piece of audio into one audio piece by time in batches
Data format: an audio file of three minutes and fifty seconds long
# split_wav_time.py
from pydub import AudioSegment
from pydub.utils import make_chunks
audio = AudioSegment.from_file("his_one/1.wav", "wav")
#size = 10000 #切割的毫秒数 10s=10000
size = 60000 #切割的毫秒数 60s=60000
chunks = make_chunks(audio, size) #将文件切割为60s一个
for i, chunk in enumerate(chunks):
chunk_name = "new-{0}.wav".format(i)
print(chunk_name)
chunk.export(chunk_name, format="wav")
Run command:
python split_wav_time.py
result:
Method 2: Batch a whole piece of audio into audio pieces based on sentence pauses
Data format: an audio file that is several minutes long
Use the split_on_silence(sound, min_silence_len, silence_thresh, keep_silence=400) function
The first parameter is the audio to be divided, the second parameter is the number of seconds "silence" represents silence, the third parameter is the decibel less than how many dBFS represents silence, and the fourth parameter is the number of ms silence added to each intercepted audio.
from pydub import AudioSegment
from pydub.silence import split_on_silence
sound = AudioSegment.from_mp3("his_one/1.wav")
loudness = sound.dBFS
#print(loudness)
chunks = split_on_silence(sound,
# must be silent for at least half a second,沉默半秒
min_silence_len=430,
# consider it silent if quieter than -16 dBFS
silence_thresh=-45,
keep_silence=400
)
print('Len:', len(chunks))
# 放弃长度小于2秒的录音片段
for i in list(range(len(chunks)))[::-1]:
if len(chunks[i]) <= 2000 or len(chunks[i]) >= 10000:
chunks.pop(i)
print('取有效分段(大于2s小于10s):', len(chunks))
'''
for x in range(0,int(len(sound)/1000)):
print(x,sound[x*1000:(x+1)*1000].max_dBFS)
'''
for i, chunk in enumerate(chunks):
chunk.export("cutwav_{0}.wav".format(i), format="wav")
#print(i)
result:
Method 3: Batch several entire audio clips in a folder into audio files one by one
3.1. Data format: Audio (wav files) that are several minutes long in a folder are cut according to a fixed number of seconds.
from pydub import AudioSegment
from pydub.utils import make_chunks
import os, re
# # 循环目录下所有文件
for each in os.listdir("/workspace/tts/PolyLangVITS/history"): #循环目录
filename = re.findall(r"(.*?)\.wav", each) # 取出.wav后缀的文件名
print(each)
if each:
# filename[0] += '.wav'
# print(filename[0])
mp3 = AudioSegment.from_file('/workspace/tts/PolyLangVITS/history/{}'.format(each), "wav") # 打开mp3文件
# # # mp3[17*1000+500:].export(filename[0], format="mp3") #
size = 15000 # 切割的毫秒数 10s=10000
chunks = make_chunks(mp3, size) # 将文件切割为15s一块
for i, chunk in enumerate(chunks):
chunk_name = "{}-{}.wav".format(each.split(".")[0],i)
print(chunk_name)
chunk.export('/workspace/tts/PolyLangVITS/preprodata/his_out/{}'.format(chunk_name), format="wav")
result
3.2. Data format: Audio files (mp3 files) that are several minutes long in a folder are cut according to a fixed number of seconds.
from pydub import AudioSegment
from pydub.utils import make_chunks
import os, re
# #
# # 循环目录下所有文件
for each in os.listdir("D:/纯音乐"): #循环目录
filename = re.findall(r"(.*?)\.mp3", each) # 取出.mp3后缀的文件名
print(each)
if each:
# filename[0] += '.wav'
# print(filename[0])
mp3 = AudioSegment.from_file('D:/纯音乐/{}'.format(each), "mp3") # 打开mp3文件
# # # mp3[17*1000+500:].export(filename[0], format="mp3") #
size = 15000 # 切割的毫秒数 10s=10000
chunks = make_chunks(mp3, size) # 将文件切割为15s一块
for i, chunk in enumerate(chunks):
chunk_name = "{}-{}.mp3".format(each.split(".")[0],i)
print(chunk_name)
chunk.export('D:/纯音乐分解/{}'.format(chunk_name), format="mp3")```
3.3. Data format: Audio (wav files) that are several minutes long in a folder are cut according to sentence pauses .
# @ Elena
# @ Date : 23.9.4
import os, re
from pydub import AudioSegment
from pydub.silence import split_on_silence
# # 循环目录下所有文件
for each in os.listdir("/workspace/tts/PolyLangVITS/history"):
filename = re.findall(r"(.*?)\.wav", each) # 取出.wav后缀的文件名
print(each)
if each:
sound = AudioSegment.from_file('/workspace/tts/PolyLangVITS/history/{}'.format(each), "wav")
loudness = sound.dBFS
#print(loudness)
chunks = split_on_silence(sound,
# must be silent for at least half a second,沉默半秒
min_silence_len=430,
# consider it silent if quieter than -16 dBFS
silence_thresh=-45,
keep_silence=400
)
print('Len:', len(chunks))
# 放弃长度小于1秒的录音片段
for i in list(range(len(chunks)))[::-1]:
if len(chunks[i]) <= 1000 or len(chunks[i]) >= 10000:
chunks.pop(i)
print('Len (1s~10s wav file):', len(chunks))
'''
for x in range(0,int(len(sound)/1000)):
print(x,sound[x*1000:(x+1)*1000].max_dBFS)
'''
for i, chunk in enumerate(chunks):
chunk_name = "{}-{}.wav".format(each.split(".")[0],i)
chunk.export("/workspace/tts/PolyLangVITS/preprodata/his_out/{}".format(chunk_name), format="wav")
#print(i)
result
Use file to query wav
(The WAV file format is a subset of Microsoft's RIFF specification, used to store multimedia files. WAV (RIFF) files are composed of several Chunks, namely: RIFF WAVE Chunk, Format Chunk, Fact Chunk (optional), Data Chunk .The specific format is as follows:)
Expand
Batch process pcm files into wav files
import wave
import os
filepath = "data/" # 添加路径
filename = os.listdir(filepath) # 得到文件夹下的所有文件名称
#f = wave.open(filepath + filename[1], 'rb')
#print(filename)
for i in range(len(filename)):
with open("data/"+failename[i], 'rb') as pcmfile:
pcmdata = pcmfile.read()
with wave.open("data/"+filename[i][:-3] + '.wav', 'wb') as wavfile:
wavfile.setparams((1, 2, 16000, 0, 'NONE', 'NONE'))
wavfile.writeframes(pcmdata)
How to query the number of files in a folder under Linux
Use ls
commands and wc
commands
Use ls
command -l
options and the pipe operator |
in conjunction with wc
the command to count the number of files:
Query the number of files with "wav" in the current folder
ls -l | grep "wav" | wc -l
Detailed explanation of WAV format files
The WAV file format is a subset of Microsoft's RIFF specification and is used for storing multimedia files. WAV (RIFF) files are composed of several Chunks, namely: RIFF WAVE Chunk, Format Chunk, Fact Chunk (optional), and Data Chunk. The specific format is as follows:
Introduction to audio file parameters.
For audio files described in parameters such as 44100HZ 16bit stereo or 22050HZ 8bit mono, the file parameters contained include:
Sampling rate: The number of times the sound signal is sampled per unit time during the "analog → digital" conversion process.
Sampling value (sampling accuracy): The integral value of the sound analog signal in each sampling period.
At the same time, each sampled data records amplitude, and the sampling accuracy depends on the size of the storage space.
For mono files, the sampling data is an 8-bit short integer, and its sampling precision is:
1 byte (8bit) can only record 256 numbers, that is, the amplitude can only be divided into 256 levels;
2 bytes (16bit) can be as detailed as 65536 numbers, which is the CD standard;
4 bytes (32bit) can It is really unnecessary to subdivide the amplitude into 4294967296 levels.
For two-channel stereo (stereo) files, each sample data is a 16-bit integer (int), and the samples are doubled, which is twice the size of the mono file. The upper eight bits (left channel) and the lower eight bits (right channel) in the sampling data represent the two channels respectively.
Since the wav format file is essentially an audio file, the playback length of the file can be estimated based on the file size, sampling frequency and sampling size.
For more information, see Microsoft WAVE soundfile format (sapp.org)