Tensorflow voiceprint recognition speaker recognition

# Preface
This chapter introduces how to use Tensorflow to implement a simple voiceprint recognition model. First, you need to be familiar with audio classification. We train a voiceprint recognition model. Through this model, we can identify who the speaker is. It can be used in some applications that require audio verification. project. The difference is that this project uses ArcFace Loss, ArcFace loss: Additive Angular Margin Loss (additive angular interval loss function), normalizes the feature vector and weight, adds the angle interval m to θ, and the angle interval is higher than the cosine interval. The effect of the angle is more direct.

Use environment:

 - Python 3.7
 - Tensorflow 2.3.0

import json
import os

from pydub import AudioSegment
from tqdm import tqdm

from utils.reader import load_audio


# 生成数据列表
def get_data_list(infodata_path, list_path, zhvoice_path):
    with open(infodata_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()

    f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
    f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')

    sound_sum = 0
    speakers = []
    speakers_dict = {}
    for line in tqdm(lines):
        line = json.loads(line.replace('\n', ''))
        duration_ms = line['duration_ms']
        if duration_ms < 1300:
            continue
        speaker = line['speaker']
        if speaker not in speakers:
            speakers_dict[speaker] = len(speakers)
            speakers.append(speaker)
        label = speakers_dict[speaker]
        sound_path = os.path.join(zhvoice_path, line['index'])
        save_path = "%s.wav" % sound_path[:-4]
        if not os.path.exists(save_path):
            try:
                wav = AudioSegment.from_mp3(sound_path)
                wav.export(save_path, format="wav")
                os.remove(sound_path)
            except Exception as e:
                print('数据出错:%s, 信息:%s' % (sound_path, e))
                continue
        if sound_sum % 200 == 0:
            f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
        else:
            f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
        sound_sum += 1

    f_test.close()
    f_train.close()


# 删除错误音频
def remove_error_audio(data_list_path):
    with open(data_list_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    lines1 = []
    for line in tqdm(lines):
        audio_path, _ = line.split('\t')
        try:
            spec_mag = load_audio(audio_path)
            lines1.append(line)
        except Exception as e:
            print(audio_path)
            print(e)
    with open(data_list_path, 'w', encoding='utf-8') as f:
        for line in lines1:
            f.write(line)


if __name__ == '__main__':
    get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
    remove_error_audio('dataset/train_list.txt')
    remove_error_audio('dataset/test_list.txt')

输出类似如下:
```
-----------  Configuration Arguments -----------
audio_db: audio_db
input_shape: (257, 257, 1)
model_path: models/infer_model.h5
threshold: 0.7
------------------------------------------------
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
resnet50v2_input (InputLayer [(None, 257, 257, 1)]     0         
_________________________________________________________________
resnet50v2 (Functional)      (None, 2048)              23558528  
_________________________________________________________________
batch_normalization (BatchNo (None, 2048)              8192      
=================================================================
Total params: 23,566,720
Trainable params: 23,517,184
Non-trainable params: 49,536
_________________________________________________________________

Loaded Li Dakang audio.
Loaded Sha Ruijin audio.
Please select a function, 0 is to register audio to the voiceprint library, 1 is to perform voiceprint recognition: 0
Press the Enter key to start recording, during recording 3 seconds:
start recording.....
.Recording is over! Please
enter the name of the audio user: Ye Yu Piao Ling
Please select the function, 0 is to register the audio to the voiceprint library, 1 is to perform the voiceprint recognition: 1
Press the Enter key to start the recording, during the 3 seconds of recording:
Start recording... The
recording is over!
The person who recognized the speech is: Ye Yu Piao Ling, and the similarity is: 0.920434
```

Download link:

https://download.csdn.net/download/babyai996/85090063

Guess you like

Origin blog.csdn.net/babyai996/article/details/124022410