This article mainly explains some of the main tools for extracting data sets, and how to use these tools, including installation and running commands.
Extract audio features toolkits for speech (dataset)
1.openSMILE
Install on Linux. Environment: Ubuntu 20.04. (docker container)
Opensmile installation method 1 and use
# 安装
pip install opensmile
#安装版本2.4.2
#单个wav文件使用
# opensmile v 2.4.2
import opensmile
path = '/workspace/dataset/mer1/audio/KETI_MULTIMODAL_0000000012_00.wav'
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.eGeMAPSv02,
feature_level=opensmile.FeatureLevel.Functionals,
)
y = smile.process_file(path)
print(y.shape)
print(y)
The features after opensmile.FeatureSet. can be viewed
Currently available (not complete)
Name | #features |
---|---|
ComParE_2016 | 65 / 65 / 6373 |
GeMAPSv01a | 5 / 13 / 62 |
GeMAPSv01b | 5 / 13 / 62 |
in GeMAPSv01a | 10 / 13 / 88 |
eGeMAPSv01b |
10 / 13 / 88 |
FeatureSet — Documentation (audeering.github.io)
Results (here eGenMAPSv02 is 1×88 dimensional)
opensmile installation method 2 and use
git clone https://github.com/audeering/opensmile
cd opensmile
sh build.sh
Add the smilextract path to the system directory
Join the system directory
source /etc/profile
Depending on the version, the path may be different. Find the smilextract path of your computer. Mine is:
export PATH="/workspace/tts/opensmile/build/progsrc/smilextract:$PATH"
(Shortcut key Esc+:+wq) Save and exit, then execute:
source /etc/profile
View version information
SMILExtract -h
If successful, the version and other relevant information will be displayed~
Call the command line to run opensmile
SMILExtract -C (configuration file) -I (followed by the input audio file) -O (followed by the output path)
(single audio) example:
SMILExtract -C ./config/is09-13/IS09_emotion.conf -I /workspace/dataset/emoko/audio/000-001.wav -O /workspace/dataset/emoko/opensmile-file/1.txt
You need to change the conf file name (configuration file) in the following command according to your own needs. Note that the audio file must be in lossless wav format.
The output file 1.txt is as follows
The last line is the specific feature data:
Call the command line in batches with python
If there are a lot of audio features that need to be extracted, it will be very troublesome to use the dos interface to type commands one by one. The source code of calling the command line in batches with python is as follows: (choose only one of the two methods)
opensmile-feature01.py
方法一
import os
path = '/workspace/dataset/emoko'
for root,dir,files in os.walk(path):
for i in files:
os.system('SMILExtract -C /workspace/tts/opensmile/config/is09-13/IS09_emotion.conf -I ' + path + '/audio/' + i + ' -O ' + '/emoko' + i[:-4] + '.csv')
方法二
import os
audio_path = '/workspace/dataset/emoko/audio' # .wav file path
output_path='/workspace/dataset/emoko/opensmile-pro' # feature file path
audio_list=os.listdir(audio_path)
features_list=[]
for audio in audio_list: # 遍历指定文件夹下的所有文件
if audio[-4:]=='.wav':
this_path_input=os.path.join(audio_path, audio) # 打开一个具体的文件,audio_path+audio
this_path_output=os.path.join(output_path,audio[:-4]+'.csv') # .txt/.csv
# 进入opensmile中要执行的文件的目录下;执行文件 -C 配置文件 -I 语音文件 -O 输出到指定文件
os.system( 'SMILExtract -C /workspace/tts/opensmile/config/is09-13/IS09_emotion.conf -I ' + this_path_input + ' -O ' + this_path_output)
print('over~')
*Note in Method 2: Three paths are required here
①.wav file folder path
②The file path of the csv file or txt file stored in the previously processed voice file extraction
③Opensmile's emotional feature configuration file
When running this py file, it must be run under the opensmile folder.
Extracted files can be saved as .txt/.csv files
Output (the feature file corresponding to each .wav file, and has the same name as the wav file)
After extracting the file, process the csv file and extract the feature vector part of the data
Batch process the generated feature text files, extract and combine matrix files that can be used for learning and processing. code show as below
opensmile-pro-csv02.py
import os
audio_path = '/workspace/dataset/emoko/audio' # .wav file path
output_path='/workspace/dataset/emoko/opensmile-pro' # feature file path
audio_list=os.listdir(audio_path)
features_list=[]
for audio in audio_list: # 遍历指定文件夹下的所有文件
if audio[-4:]=='.wav':
this_path_input=os.path.join(audio_path, audio) # 打开一个具体的文件,audio_path+audio
this_path_output=os.path.join(output_path,audio[:-4]+'.csv') # .txt/.csv
# 进入opensmile中要执行的文件的目录下;执行文件 -C 配置文件 -I 语音文件 -O 输出到指定文件
os.system( 'SMILExtract -C /workspace/tts/opensmile/config/is09-13/IS09_emotion.conf -I ' + this_path_input + ' -O ' + this_path_output)
print('over~')
*Note: Three paths are required here
①.wav file folder path
②The file path of the csv file or txt file stored in the previously processed voice file extraction
③Opensmile's emotional feature configuration file
When running this py file, it must be run under the opensmile folder.
Feature analysis can be performed through the following Python code:
def feature_file_reader(feature_fp):
"""
读取生成的ARFF格式csv特征文件中特征值
:param feature_fp: csv特征文件路径
:return: np.array
"""
with open(feature_fp) as f:
last_line = f.readlines()[-1] # ARFF格式csv文件最后一行包含特征数据
features = last_line.split(",")
features = np.array(features[1:-1], dtype="float64") # 第2~倒数第二个为特征数据
return features
Save as an npy file
import os
import numpy as np
txt_path='输出文件夹路径'
txt_list=os.listdir(txt_path)
features_list=[]
for txt in txt_list:
if txt[-4:]=='.txt':
this_path=os.path.join(txt_path,txt)
f=open(this_path)
last_line=f.readlines()[-1]
f.close()
features=last_line.split(',')
features=features[1:-1]
features_list.append(features)
features_array=np.array(features_list)
np.save('保存文件的路径/opensmile_features.npy',features_array)
Create your own configuration file
可参考audio - How to create custom config files in OpenSMILE - Stack Overflow
Detailed explanation of opensmile parameters
01. Run command parameters
More can be found in the Reference section — openSMILE Documentation
The first is when processing data
'lldsink','lldhtksink','lldarffsink','csvsink','htksink',cArffSink,'arffsink' can be found in opensmile/standard_data_output.conf.inc at master naxingyu/opensmile GitHub
The configuration file that comes with opensmile:
config/
For common tasks in the fields of music information retrieval and speech processing, we provide some example configuration files in the catalog for the following commonly used feature sets. These also contain the baseline acoustic feature sets from the 2009-2013 INTERSPEECH impact and paralinguistics challenges:
-
Chromatic features for key and chord recognition
-
MFCC for speech recognition
-
PLP for speech recognition
-
Prosody (pitch and loudness)
-
INTERSPEECH 2009 Emotional Challenge Feature Set
-
INTERSPEECH 2010 Paralanguage Challenge Feature Set
-
INTERSPEECH 2011 Speaker State Challenge Feature Set
-
INTERSPEECH 2012 Speaker Trait Challenge Feature Set
-
INTERSPEECH 2013 ComParE Feature Set
-
MediaEval 2012 TUM feature set for violent scene detection.
-
Three reference feature sets for emotion recognition (old set, eliminated by the new INTERSPEECH challenge set)
-
Audio-visual features based on the audio features of the INTERSPEECH 2010 Paralinguistic Challenge.
ESPnet
Linux (Ubuntu20.04) installation
apt-get install cmake
apt-get install sox
apt-get install flac
git clone https://github.com/espnet/espnet
cd espnet/tools
#python环境下
bash setup_python.sh $(command -v python3)
make
#根据make后出来的值复制 TH_VERSION 和 CUDA_VERSION 版本
make TH_VERSION=1.13.1 CUDA_VERSION=11.4
For others, please refer to Installation — ESPnet 202304 documentation
check installation
bash -c ". ./activate_python.sh; . ./extra_path.sh; python3 check_install.py"
【PS1】bash: SMILExtract: command not found
apt install automake
apt install autoconf
apt install libtool
apt install m4
apt install gcc
apt update
try 1
./SMILExtract -h
出现bash: ./SMILExtract: No such file or directory
try 2
strace ./SMILExtract
出现strace: Can't stat './SMILExtract': No such file or directory
references
[1]opensmile/INSTALL at master · naxingyu/opensmile · GitHub