Speech | Speech processing, segmenting a piece of audio (python)

This article is mainly about some script files and examples in the process of processing voice data. All codes only need to change the file path to be processed, the output path, etc., and all can be run.

Table of contents

required environment

Method 1: Cut a whole piece of audio into one audio piece by time in batches

Method 2: Batch a whole piece of audio into audio pieces based on sentence pauses

Method 3: Batch several entire audio clips in a folder into audio files one by one

3.1. Data format: Audio (wav files) that are several minutes long in a folder are cut according to a fixed number of seconds.

3.2. Data format: Audio files (mp3 files) that are several minutes long in a folder are cut according to a fixed number of seconds.

3.3. Data format: Audio (wav files) that are several minutes long in a folder are cut according to sentence pauses.

Expand

Batch process pcm files into wav files

How to query the number of files in a folder under Linux

Use ls command and wc command

Detailed explanation of WAV format files


required environment

Environment for this article: Linux

pydub (installation: pip3 install pydub)

ffmpeg(apt install ffmpeg)

Method 1: Cut a whole piece of audio into one audio piece by time in batches

Data format: an audio file of three minutes and fifty seconds long

# split_wav_time.py
from pydub import AudioSegment
from pydub.utils import make_chunks

audio = AudioSegment.from_file("his_one/1.wav", "wav")

#size = 10000  #切割的毫秒数 10s=10000
size = 60000  #切割的毫秒数 60s=60000

chunks = make_chunks(audio, size)  #将文件切割为60s一个

for i, chunk in enumerate(chunks):
    chunk_name = "new-{0}.wav".format(i)
    print(chunk_name)
    chunk.export(chunk_name, format="wav")

Run command:

python split_wav_time.py

 result:

Method 2: Batch a whole piece of audio into audio pieces based on sentence pauses

Data format: an audio file that is several minutes long

Use the split_on_silence(sound, min_silence_len, silence_thresh, keep_silence=400) function

The first parameter is the audio to be divided, the second parameter is the number of seconds "silence" represents silence, the third parameter is the decibel less than how many dBFS represents silence, and the fourth parameter is the number of ms silence added to each intercepted audio.

from pydub import AudioSegment
from pydub.silence import split_on_silence
 
sound = AudioSegment.from_mp3("his_one/1.wav")
loudness = sound.dBFS
#print(loudness)
 
chunks = split_on_silence(sound,
    # must be silent for at least half a second,沉默半秒
    min_silence_len=430,
 
    # consider it silent if quieter than -16 dBFS
    silence_thresh=-45,
    keep_silence=400
 
)
print('Len:', len(chunks))
 
# 放弃长度小于2秒的录音片段
for i in list(range(len(chunks)))[::-1]:
    if len(chunks[i]) <= 2000 or len(chunks[i]) >= 10000:
        chunks.pop(i)
print('取有效分段(大于2s小于10s):', len(chunks))
 
'''
for x in range(0,int(len(sound)/1000)):
    print(x,sound[x*1000:(x+1)*1000].max_dBFS)
'''
 
for i, chunk in enumerate(chunks):
    chunk.export("cutwav_{0}.wav".format(i), format="wav")
    #print(i)

  result:

Method 3: Batch several entire audio clips in a folder into audio files one by one

3.1. Data format: Audio (wav files) that are several minutes long in a folder are cut according to a fixed number of seconds.

from pydub import AudioSegment
from pydub.utils import make_chunks
import os, re

# # 循环目录下所有文件
for each in os.listdir("/workspace/tts/PolyLangVITS/history"): #循环目录
    
    filename = re.findall(r"(.*?)\.wav", each) # 取出.wav后缀的文件名
    print(each)
    if each:
        # filename[0] += '.wav'
        # print(filename[0])

        mp3 = AudioSegment.from_file('/workspace/tts/PolyLangVITS/history/{}'.format(each), "wav") # 打开mp3文件
#         # # mp3[17*1000+500:].export(filename[0], format="mp3") #
        size = 15000  # 切割的毫秒数 10s=10000

        chunks = make_chunks(mp3, size)  # 将文件切割为15s一块

        for i, chunk in enumerate(chunks):

            chunk_name = "{}-{}.wav".format(each.split(".")[0],i)
            print(chunk_name)
            chunk.export('/workspace/tts/PolyLangVITS/preprodata/his_out/{}'.format(chunk_name), format="wav")



 

 result

3.2. Data format: Audio files (mp3 files) that are several minutes long in a folder are cut according to a fixed number of seconds.

from pydub import AudioSegment
from pydub.utils import make_chunks
import os, re
# #
# # 循环目录下所有文件
for each in os.listdir("D:/纯音乐"): #循环目录
    
    filename = re.findall(r"(.*?)\.mp3", each) # 取出.mp3后缀的文件名
    print(each)
    if each:
        # filename[0] += '.wav'
        # print(filename[0])

        mp3 = AudioSegment.from_file('D:/纯音乐/{}'.format(each), "mp3") # 打开mp3文件
#         # # mp3[17*1000+500:].export(filename[0], format="mp3") #
        size = 15000  # 切割的毫秒数 10s=10000

        chunks = make_chunks(mp3, size)  # 将文件切割为15s一块

        for i, chunk in enumerate(chunks):

            chunk_name = "{}-{}.mp3".format(each.split(".")[0],i)
            print(chunk_name)
            chunk.export('D:/纯音乐分解/{}'.format(chunk_name), format="mp3")```



 

 3.3. Data format: Audio (wav files) that are several minutes long in a folder are cut according to sentence pauses .

 

# @ Elena
# @ Date : 23.9.4


import os, re
from pydub import AudioSegment
from pydub.silence import split_on_silence

# # 循环目录下所有文件
for each in os.listdir("/workspace/tts/PolyLangVITS/history"): 
    filename = re.findall(r"(.*?)\.wav", each) # 取出.wav后缀的文件名
    print(each)
    if each:
        sound = AudioSegment.from_file('/workspace/tts/PolyLangVITS/history/{}'.format(each), "wav")
        loudness = sound.dBFS
    #print(loudness)
 
    chunks = split_on_silence(sound,
        # must be silent for at least half a second,沉默半秒
        min_silence_len=430,
    
        # consider it silent if quieter than -16 dBFS
        silence_thresh=-45,
        keep_silence=400
    
    )
    print('Len:', len(chunks))
    
    # 放弃长度小于1秒的录音片段
    for i in list(range(len(chunks)))[::-1]:
        if len(chunks[i]) <= 1000 or len(chunks[i]) >= 10000:
            chunks.pop(i)
    print('Len (1s~10s wav file):', len(chunks))
    
    '''
    for x in range(0,int(len(sound)/1000)):
        print(x,sound[x*1000:(x+1)*1000].max_dBFS)
    '''
    
    for i, chunk in enumerate(chunks):
        chunk_name = "{}-{}.wav".format(each.split(".")[0],i) 
        chunk.export("/workspace/tts/PolyLangVITS/preprodata/his_out/{}".format(chunk_name), format="wav")
        #print(i)

result

Use file to query wav

(The WAV file format is a subset of Microsoft's RIFF specification, used to store multimedia files. WAV (RIFF) files are composed of several Chunks, namely: RIFF WAVE Chunk, Format Chunk, Fact Chunk (optional), Data Chunk .The specific format is as follows:)

Expand

Batch process pcm files into wav files

import wave
import os
 
filepath = "data/"  # 添加路径
filename = os.listdir(filepath)  # 得到文件夹下的所有文件名称
#f = wave.open(filepath + filename[1], 'rb')
#print(filename)
for i in range(len(filename)):
    with open("data/"+failename[i], 'rb') as pcmfile:
        pcmdata = pcmfile.read()
    with wave.open("data/"+filename[i][:-3] + '.wav', 'wb') as wavfile:
        wavfile.setparams((1, 2, 16000, 0, 'NONE', 'NONE'))
        wavfile.writeframes(pcmdata)

How to query the number of files in a folder under Linux

Use lscommands and wccommands

Use lscommand -loptions and the pipe operator |in conjunction with wcthe command to count the number of files:

Query the number of files with "wav" in the current folder

ls -l | grep "wav" | wc -l

 

Detailed explanation of WAV format files

The WAV file format is a subset of Microsoft's RIFF specification and is used for storing multimedia files. WAV (RIFF) files are composed of several Chunks, namely: RIFF WAVE Chunk, Format Chunk, Fact Chunk (optional), and Data Chunk. The specific format is as follows:

Introduction to audio file parameters.
For audio files described in parameters such as 44100HZ 16bit stereo or 22050HZ 8bit mono, the file parameters contained include:

Sampling rate: The number of times the sound signal is sampled per unit time during the "analog → digital" conversion process.
Sampling value (sampling accuracy): The integral value of the sound analog signal in each sampling period.
At the same time, each sampled data records amplitude, and the sampling accuracy depends on the size of the storage space.
For mono files, the sampling data is an 8-bit short integer, and its sampling precision is:

1 byte (8bit) can only record 256 numbers, that is, the amplitude can only be divided into 256 levels;
2 bytes (16bit) can be as detailed as 65536 numbers, which is the CD standard;
4 bytes (32bit) can It is really unnecessary to subdivide the amplitude into 4294967296 levels.
For two-channel stereo (stereo) files, each sample data is a 16-bit integer (int), and the samples are doubled, which is twice the size of the mono file. The upper eight bits (left channel) and the lower eight bits (right channel) in the sampling data represent the two channels respectively.

Since the wav format file is essentially an audio file, the playback length of the file can be estimated based on the file size, sampling frequency and sampling size.
For more information, see Microsoft WAVE soundfile format (sapp.org)

Guess you like

Origin blog.csdn.net/weixin_44649780/article/details/132672659