Speech | Speech feature collection for extracting speech (dataset)

This article mainly explains some of the main tools for extracting data sets, and how to use these tools, including installation and running commands. 

Extract audio features toolkits for speech (dataset)

openSMILE
COVAREP
ESPNet
JAFFE

1.openSMILE

Install on Linux. Environment: Ubuntu  20.04. (docker container)

Opensmile installation method 1 and use

# 安装 
pip install opensmile

#安装版本2.4.2



#单个wav文件使用


# opensmile v 2.4.2
import opensmile

path  = '/workspace/dataset/mer1/audio/KETI_MULTIMODAL_0000000012_00.wav'

smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.eGeMAPSv02,
    feature_level=opensmile.FeatureLevel.Functionals,
)

y = smile.process_file(path)
print(y.shape)
print(y)

The features after opensmile.FeatureSet. can be viewed

 Currently available (not complete)

Name #features
ComParE_2016 65 / 65 / 6373
GeMAPSv01a 5 / 13 / 62
GeMAPSv01b 5 / 13 / 62
in GeMAPSv01a 10 / 13 / 88

eGeMAPSv01b

10 / 13 / 88

FeatureSet — Documentation (audeering.github.io) 

Results (here eGenMAPSv02 is 1×88 dimensional)

 

opensmile installation method 2 and use

git clone https://github.com/audeering/opensmile
cd opensmile
sh build.sh

 Add the smilextract path to the system directory

Join the system directory

source /etc/profile

 Depending on the version, the path may be different. Find the smilextract path of your computer. Mine is:

export PATH="/workspace/tts/opensmile/build/progsrc/smilextract:$PATH"

 (Shortcut key Esc+:+wq) Save and exit, then execute:

source /etc/profile

 View version information

SMILExtract -h

 If successful, the version and other relevant information will be displayed~

Call the command line to run opensmile

SMILExtract -C (configuration file) -I (followed by the input audio file) -O (followed by the output path)

(single audio) example:

SMILExtract -C ./config/is09-13/IS09_emotion.conf -I /workspace/dataset/emoko/audio/000-001.wav -O /workspace/dataset/emoko/opensmile-file/1.txt

 You need to change the conf file name (configuration file) in the following command according to your own needs. Note that the audio file must be in lossless wav format.

 The output file 1.txt is as follows

The last line is the specific feature data:

Call the command line in batches with python

    If there are a lot of audio features that need to be extracted, it will be very troublesome to use the dos interface to type commands one by one. The source code of calling the command line in batches with python is as follows: (choose only one of the two methods)

opensmile-feature01.py

方法一
import os

path = '/workspace/dataset/emoko' 
for root,dir,files in os.walk(path):
    for i in files:
        os.system('SMILExtract -C /workspace/tts/opensmile/config/is09-13/IS09_emotion.conf -I ' + path + '/audio/' + i + ' -O ' + '/emoko' + i[:-4] + '.csv')





方法二
import os
audio_path = '/workspace/dataset/emoko/audio'  # .wav file  path
output_path='/workspace/dataset/emoko/opensmile-pro'   # feature file path
audio_list=os.listdir(audio_path)   
features_list=[]
for audio in audio_list:    # 遍历指定文件夹下的所有文件
    if audio[-4:]=='.wav':
        this_path_input=os.path.join(audio_path, audio)  # 打开一个具体的文件,audio_path+audio
        this_path_output=os.path.join(output_path,audio[:-4]+'.csv') # .txt/.csv
        # 进入opensmile中要执行的文件的目录下;执行文件 -C 配置文件 -I 语音文件 -O 输出到指定文件
        os.system( 'SMILExtract -C /workspace/tts/opensmile/config/is09-13/IS09_emotion.conf -I ' + this_path_input + ' -O ' + this_path_output)
print('over~')


*Note in Method 2: Three paths are required here

①.wav file folder path

②The file path of the csv file or txt file stored in the previously processed voice file extraction

③Opensmile's emotional feature configuration file

When running this py file, it must be run under the opensmile folder.

Extracted files can be saved as .txt/.csv files

Output (the feature file corresponding to each .wav file, and has the same name as the wav file)

After extracting the file, process the csv file and extract the feature vector part of the data

Batch process the generated feature text files, extract and combine matrix files that can be used for learning and processing. code show as below

opensmile-pro-csv02.py

import os
audio_path = '/workspace/dataset/emoko/audio'  # .wav file  path
output_path='/workspace/dataset/emoko/opensmile-pro'   # feature file path
audio_list=os.listdir(audio_path)   
features_list=[]
for audio in audio_list:    # 遍历指定文件夹下的所有文件
    if audio[-4:]=='.wav':
        this_path_input=os.path.join(audio_path, audio)  # 打开一个具体的文件,audio_path+audio
        this_path_output=os.path.join(output_path,audio[:-4]+'.csv') # .txt/.csv
        # 进入opensmile中要执行的文件的目录下;执行文件 -C 配置文件 -I 语音文件 -O 输出到指定文件
        os.system( 'SMILExtract -C /workspace/tts/opensmile/config/is09-13/IS09_emotion.conf -I ' + this_path_input + ' -O ' + this_path_output)
print('over~')

*Note: Three paths are required here

①.wav file folder path

②The file path of the csv file or txt file stored in the previously processed voice file extraction

③Opensmile's emotional feature configuration file

When running this py file, it must be run under the opensmile folder.

Feature analysis can be performed through the following Python code:

def feature_file_reader(feature_fp):
    """
    读取生成的ARFF格式csv特征文件中特征值
    :param feature_fp: csv特征文件路径
    :return: np.array
    """
    with open(feature_fp) as f:
        last_line = f.readlines()[-1]  # ARFF格式csv文件最后一行包含特征数据
    features = last_line.split(",")
    features = np.array(features[1:-1], dtype="float64")  # 第2~倒数第二个为特征数据
    return features

Save as an npy file

import os
import numpy as np
txt_path='输出文件夹路径'
txt_list=os.listdir(txt_path)
features_list=[]
for txt in txt_list:
    if txt[-4:]=='.txt':
        this_path=os.path.join(txt_path,txt)
        f=open(this_path)
        last_line=f.readlines()[-1]
        f.close()
        features=last_line.split(',')
        features=features[1:-1]
        features_list.append(features)
features_array=np.array(features_list)
np.save('保存文件的路径/opensmile_features.npy',features_array)

Create your own configuration file

可参考audio - How to create custom config files in OpenSMILE - Stack Overflow

 openSMILE 3.0 - audEERING 

Detailed explanation of opensmile parameters

01. Run command parameters

More can be found in the Reference section — openSMILE Documentation

The first is when processing data

'lldsink','lldhtksink','lldarffsink','csvsink','htksink',cArffSink,'arffsink' can be found in opensmile/standard_data_output.conf.inc at master naxingyu/opensmile GitHub

The configuration file that comes with opensmile:

config/For common tasks in the fields of music information retrieval and speech processing, we provide some example configuration files in the catalog for the following commonly used feature sets. These also contain the baseline acoustic feature sets from the 2009-2013 INTERSPEECH impact and paralinguistics challenges:

  • Chromatic features for key and chord recognition

  • MFCC for speech recognition

  • PLP for speech recognition

  • Prosody (pitch and loudness)

  • INTERSPEECH 2009 Emotional Challenge Feature Set

  • INTERSPEECH 2010 Paralanguage Challenge Feature Set

  • INTERSPEECH 2011 Speaker State Challenge Feature Set

  • INTERSPEECH 2012 Speaker Trait Challenge Feature Set

  • INTERSPEECH 2013 ComParE Feature Set

  • MediaEval 2012 TUM feature set for violent scene detection.

  • Three reference feature sets for emotion recognition (old set, eliminated by the new INTERSPEECH challenge set)

  • Audio-visual features based on the audio features of the INTERSPEECH 2010 Paralinguistic Challenge.

ESPnet

Linux (Ubuntu20.04) installation

apt-get install cmake
apt-get install sox
apt-get install flac

git clone https://github.com/espnet/espnet
cd espnet/tools

#python环境下
bash setup_python.sh $(command -v python3)

make

#根据make后出来的值复制 TH_VERSION 和 CUDA_VERSION 版本

make TH_VERSION=1.13.1 CUDA_VERSION=11.4

For others, please refer to Installation — ESPnet 202304 documentation

check installation

 bash -c ". ./activate_python.sh; . ./extra_path.sh; python3 check_install.py"

【PS1】bash: SMILExtract: command not found

https://fxburk.medium.com/machine-classification-of-emotional-speech-with-emodb-and-python-25a67753210e

apt install automake
apt install autoconf
apt install libtool
apt install m4
apt install gcc
apt  update

 try 1

./SMILExtract -h

出现bash: ./SMILExtract: No such file or directory

  try 2

strace ./SMILExtract

出现strace: Can't stat './SMILExtract': No such file or directory

references

[1]opensmile/INSTALL at master · naxingyu/opensmile · GitHub

Guess you like

Origin blog.csdn.net/weixin_44649780/article/details/131089378