Article directory
1. Install MFA
1. Install anaconda
2. Create and enter the virtual environment
conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
3. Install pyTorch
CPU environment:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
GPU environment:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
2. Training a new acoustic model
1. Make sure the dataset is in the correct format
mfa validate ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt
~/mfa_data/my_corpus : dataset
~/mfa_data/my_dictionary.txt : dictionary
This command will look at the corpus and make sure MFA parses everything correctly. MFA supports several different types of corpus formats and structures, but generally the core requirement is that you should have a sound file and transcription file pair with the same name (except extension). Review the validator output to ensure that the number of speakers and the number of files and sentences are as expected, and that the number of out-of-vocabulary (OOV) items is not too high.
2. Train the sound model - export the model and alignment files
mfa train ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt ~/mfa_data/new_acoustic_model.zip # 仅导出声音模型
mfa train ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt ~/mfa_data/my_corpus_aligned # 仅导出对齐文件
mfa train ~/mfa_data/my_corpus ~/mfa_data/my_dictionary.txt ~/mfa_data/new_acoustic_model.zip ~/mfa_data/my_corpus_aligned # 导出声音模型和对齐文件
If your data is large, you may need to increase the number of jobs used by MFA.
If the training was successful, you will see TextGrids in the output directory. TextGrid export is the same as when running with a trained acoustic model.
If you choose to export an acoustic model, you can now use this model for other utilities and use cases, such as optimizing pronunciation dictionaries for new data by adding probabilities to the dictionary (mfa train_dictionary) or transcribing audio files (mfa transcribe).
Wait for the training to finish~