[MFA] In the windows environment, use Montreal-Forced-Aligner to train and align audio

1. Install MFA

Official installation link
insert image description here

1. Install anaconda

2. Create and enter the virtual environment

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
insert image description here
insert image description here

3. Install pyTorch

CPU environment:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
GPU environment:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
insert image description here

2. Training a new acoustic model

1. Make sure the dataset is in the correct format

mfa validate ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt

~/mfa_data/my_corpus : dataset
~/mfa_data/my_dictionary.txt : dictionary
This command will look at the corpus and make sure MFA parses everything correctly. MFA supports several different types of corpus formats and structures, but generally the core requirement is that you should have a sound file and transcription file pair with the same name (except extension). Review the validator output to ensure that the number of speakers and the number of files and sentences are as expected, and that the number of out-of-vocabulary (OOV) items is not too high.

insert image description here
insert image description here

2. Train the sound model - export the model and alignment files

mfa train ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt ~/mfa_data/new_acoustic_model.zip  # 仅导出声音模型
mfa train ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt ~/mfa_data/my_corpus_aligned  # 仅导出对齐文件
mfa train ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt ~/mfa_data/new_acoustic_model.zip ~/mfa_data/my_corpus_aligned  # 导出声音模型和对齐文件

If your data is large, you may need to increase the number of jobs used by MFA.
If the training was successful, you will see TextGrids in the output directory. TextGrid export is the same as when running with a trained acoustic model.
If you choose to export an acoustic model, you can now use this model for other utilities and use cases, such as optimizing pronunciation dictionaries for new data by adding probabilities to the dictionary (mfa train_dictionary) or transcribing audio files (mfa transcribe).

insert image description here
Wait for the training to finish~

3. Error handling

1. Encountered an error similar to: Command '['createdb',–host=' ', 'Librispeech']' returned non-zero exit status 1

Click: The reason is that the service is not started

Guess you like

Origin blog.csdn.net/qq_46319397/article/details/129431602