声纹识别之Alize入门教程(二)：GMM-UBM

准备工作

Alize官网https://alize.univ-avignon.fr有四个demo：

1.GMM/UBM System

2. I-vector System

3.JFA System

4.Top-down Speaker Segmenting and Clustering System

下载第一个GMM-UBM例子 01_GMM-UBM_system_with_ALIZE3.0.tar.gz

解压后包含文件及文件夹

bin：存放可执行文件

cfg：存放配置文件

data：存放数据以及提取的特征文件

gmm：主要存放生成的gmm模型

log：训练日志

lst/ndx：主要存放准备数据的一些列表文件

res：生成的结果

Linux平台执行

在Linux下直接按照readme执行即可，01_RUN_feature_extraction.sh => 02b_RUN_htk_front-end.sh/02a_RUN_spro_front-end.sh => 03_RUN_gmm-ubm.sh

01_RUN_feature_extraction.sh ：提前特征及一些预处理

02b_RUN_htk_front-end.sh/02a_RUN_spro_front-end.sh：分别是使用htk和spro提特征时使用，选择一个即可

03_RUN_gmm-ubm.sh：训练模型及计算分数，以及一些得分规整

Windows平台执行

windows平台执行过程跟Linux下一致，如果有shell运行环境直接执行也是可以的。为了方便，使用python脚本在windows下运行可执行命令，实现该过程。

自带的例子中使用的是sph格式的音频，这里使用wav格式的音频，采用Spro提取MFCC特征，训练自己的数据模型。

更改模型等参数主要修改cfg/下的配置文件。

data/目录结构如下：在产生lbl文件的时候如果已经存在则会出现错误，建议每次训练将lbl/ prm/下的文件清空。

1.训练UBM（01_ubm.py）

对data/ubm下的音频文件提取mfcc特征，保存在data/prm/目录下，同时产生lst/UBM.lst文件，然后能量检测（VAD）、特征归一化、以及UBM训练。

import time
import os
import utils

#产生UBM.lst文件
utils.gen_ubm_lst('./data/ubm', './lst/UBM.lst')

print('*************./bin/sfbecp start***************')
time.sleep(5)

file = open("./lst/UBM.lst")
for line in file:
   line=line.strip().strip('\r\n').strip()
   #print line
   COMMAND_LINE = '%s%s%s%s%s'%('bin\\sfbcep.exe -m -k 0.97 -p19 -n 24 -r 22 -e -D -A -F wave ./data/ubm/',line,'.wav ./data/prm/',line,'.tmp.prm')

   rc = os.system(COMMAND_LINE)

   print(COMMAND_LINE)
file.close()

print('*************./bin/sfbecp end***************')
time.sleep(5)

print('*************Normalise energy start***************')
time.sleep(5)

CMD_NORM_E="bin\\NormFeat.exe --config cfg/NormFeat_energy_SPro.cfg --inputFeatureFilename ./lst/UBM.lst --featureFilesPath data/prm/"
print(CMD_NORM_E)
rc = os.system(CMD_NORM_E)

print('*************Normalise energy end***************')
time.sleep(5)

print('*************Energy Detector start***************')
time.sleep(5)

CMD_ENERGY="bin\\EnergyDetector.exe --config cfg/EnergyDetector_SPro.cfg --inputFeatureFilename ./lst/UBM.lst --featureFilesPath data/prm/ --labelFilesPath data/lbl/"
print (CMD_ENERGY)
rc = os.system(CMD_ENERGY)

print('*************Energy Detector end***************')
time.sleep(5)

print('*************Normalise Features start***************')
time.sleep(5)

CMD_NORM="bin\\NormFeat.exe --config cfg/NormFeat_SPro.cfg --inputFeatureFilename ./lst/UBM.lst --featureFilesPath data/prm/ --labelFilesPath data/lbl/"
print (CMD_NORM)
rc = os.system(CMD_NORM)

print('*************Normalise Features end***************')
time.sleep(5)

print('*************Train UBM start***************')
time.sleep(5)

rc = os.system('bin\\TrainWorld.exe --config cfg/TrainWorld.cfg')

print('*************Train UBM end***************')

2.训练说话人模型（02_model.py）

对data/train下的音频文件提取mfcc特征，保存在data/prm/目录下，同时产生lst/model.lst、./ndx/trainModel.ndx文件，然后能量检测（VAD）、特征归一化、以及说话人模型训练。

import time
import os

import utils

# 产生model.lst文件
utils.gen_model_lst ('./data/train', './lst/model.lst')
# 产生trainModel.ndx文件
utils.gen_model_ndx ('./data/train', './ndx/trainModel.ndx')

print('*************./bin/sfbecp start***************')
time.sleep(5)

srcFilePath = './data/train'
dirList = os.listdir(srcFilePath)
dirList.sort()
for targeName in dirList:
   dirName = os.path.join(srcFilePath,targeName)
   if not os.path.isdir(dirName):
       continue
   fileList = os.listdir(dirName)
   fileList.sort()
   for fileName in fileList:
       if fileName[fileName.find('.'):] != '.wav':
           print ('not .wav file')
       else:
           tmpName = fileName[0:fileName.find('.')]
           COMMAND_LINE = '%s%s%s%s%s'%('bin\\sfbcep.exe -m -k 0.97 -p19 -n 24 -r 22 -e -D -A -F wave ',os.path.join(dirName,fileName),' ./data/prm/',tmpName,'.tmp.prm')
           rc = os.system(COMMAND_LINE)
           print(COMMAND_LINE)

print('*************./bin/sfbecp end***************')
time.sleep(5)

print('*************Normalise energy start***************')
time.sleep(5)

CMD_NORM_E="bin\\NormFeat.exe --config cfg/NormFeat_energy_SPro.cfg --inputFeatureFilename ./lst/model.lst --featureFilesPath data/prm/"
print(CMD_NORM_E)
rc = os.system(CMD_NORM_E)

print('*************Normalise energy end***************')
time.sleep(5)


print('*************Energy Detector start***************')
time.sleep(5)

CMD_ENERGY="bin\\EnergyDetector.exe --config cfg/EnergyDetector_SPro.cfg --inputFeatureFilename ./lst/model.lst --featureFilesPath data/prm/ --labelFilesPath data/lbl/"
print(CMD_ENERGY)
rc = os.system(CMD_ENERGY)

print('*************Energy Detector end***************')
time.sleep(5)

print('*************Normalise Features start***************')
time.sleep(5)

CMD_NORM="bin\\NormFeat.exe --config cfg/NormFeat_SPro.cfg --inputFeatureFilename ./lst/model.lst --featureFilesPath data/prm/ --labelFilesPath data/lbl/"
print (CMD_NORM)
rc = os.system(CMD_NORM)

print('*************Normalise Features end***************')
time.sleep(5)

print('*************Train Target start***************')
time.sleep(5)

rc = os.system('bin\\TrainTarget.exe --config cfg/TrainTarget.cfg')

print('*************Train Target end***************')
time.sleep(5)

3.测试（03_test.py）

对data/test下的音频文件提取mfcc特征，保存在data/prm/目录下，同时产生lst/test.lst、./ndx/computetest_gmm_target-seg.ndx文件，然后能量检测（VAD）、特征归一化、以及在每个说话人模型上测试每个文件的得分，得分文件为res/target-seg_gmm.res。

import time
import os
import utils

#产生测试列表文件test.lst
train_path = './data/train'
test_path = './data/test'
test_listFilePath = './lst/test.lst'
utils.gen_ubm_lst(test_path,test_listFilePath)

# 产生./ndx/computetest_gmm_target-seg.ndx
utils.gen_target_seg_ndx (test_path, train_path, './ndx/computetest_gmm_target-seg.ndx')

print('*************./bin/sfbecp start***************')
time.sleep(5)

file = open("./lst/test.lst")

for line in file:
   line=line.strip('\r\n').strip()
   #print line
   COMMAND_LINE = '%s%s%s%s%s'%('bin\\sfbcep.exe -m -k 0.97 -p19 -n 24 -r 22 -e -D -A -F wave ./data/test/',line,'.wav ./data/prm/',line,'.tmp.prm')

   rc = os.system(COMMAND_LINE)

   print(COMMAND_LINE)

file.close()

print('*************./bin/sfbecp end***************')
time.sleep(5)

print('*************Normalise energy start***************')
time.sleep(5)

CMD_NORM_E="bin\\NormFeat.exe --config cfg/NormFeat_energy_SPro.cfg --inputFeatureFilename ./lst/test.lst --featureFilesPath data/prm/"
print(CMD_NORM_E)
rc = os.system(CMD_NORM_E)

print('*************Normalise energy end***************')
time.sleep(5)

print ('*************Energy Detector start***************')
time.sleep(5)

CMD_ENERGY="bin\\EnergyDetector.exe --config cfg/EnergyDetector_SPro.cfg --inputFeatureFilename ./lst/test.lst --featureFilesPath data/prm/ --labelFilesPath data/lbl/"
print(CMD_ENERGY)
rc = os.system(CMD_ENERGY)

print('*************Energy Detector end***************')
time.sleep(5)

print('*************Normalise Features start***************')
time.sleep(5)

CMD_NORM="bin\\NormFeat.exe --config cfg/NormFeat_SPro.cfg --inputFeatureFilename ./lst/test.lst --featureFilesPath data/prm/ --labelFilesPath data/lbl/"
print (CMD_NORM)
rc = os.system(CMD_NORM)

print('*************Normalise Features end***************')
time.sleep(5)

print('*************Test start***************')
time.sleep(5)

rc = os.system('bin\\ComputeTest.exe --config cfg/ComputeTest_GMM.cfg')

print('*************Test end***************')
time.sleep(5)

最后得分文件res/target-seg_gmm.res内容如下：

第一列M表示性别，不用理会；第二列为训练的说话人模型，这里共三个S0002 S0003 S0004，相当于注册了三个人

第三列0表示最后那列的得分小于0，1表示大于0

第四列表示测试文件的名称，第五列表示得分

如第一行M S0002 0 BAC009S0004W0123 -0.268589表示BAC009S0004W0123文件在说话人S0002模型上的得分为-0.268589

M S0002 0 BAC009S0004W0123 -0.268589
M S0003 1 BAC009S0004W0123 0.149758
M S0004 1 BAC009S0004W0123 1.05531
M S0002 0 BAC009S0004W0124 -0.552765
M S0003 0 BAC009S0004W0124 -0.164467
M S0004 1 BAC009S0004W0124 1.09851
M S0002 0 BAC009S0004W0125 -0.372259
M S0003 1 BAC009S0004W0125 0.0132198
M S0004 1 BAC009S0004W0125 1.09078
M S0002 0 BAC009S0004W0126 -0.326163
M S0003 1 BAC009S0004W0126 0.219501
M S0004 1 BAC009S0004W0126 0.880061

如果在Linux下训练，则替换对应的exe文件，以及路径写法，\\替换为/

gmm-ubm理论还是比较简单，重点是理解GMM模型，可以参考论文《Speaker verification using adapted Gaussian mixture models》

得分规则技术可以参考论文《基于TZ Normalization规整的话者确认阈值选取》

由于实验过程中发现几种得分规则并没有多大提升效果，所以程序中没有得分规则过程，同时只保留了必要的文件。

完整代码下载地址https://download.csdn.net/download/u012594175/11100604

声纹识别交流QQ群：875705987