Kaldi Speech Recognition Technology (8) ----- Integrating HCLG

Kaldi Speech Recognition Technology (8) ----- Integrating HCLG

HCLG overview

HCLG= min(it(H o min(it(C o min(it(L o G)))))

Combine the four layers layer by layer to get the final graph. Among them, o represents combination, det represents determinization, and min represents minimization.

The fusion of WFST is generally from large to small , that is, first fuse G and L, and then fuse C and H again. Each fusion must be determinized and minimized. Minimization refers to WFST It is converted to an equivalent WFST with fewer state nodes and edges to improve search efficiency. The combination of HCLG can refer tokaldi/wsj/s5/utils/mkgraph.sh

insert image description here

Combine LG.fst

  • fsttablecompose

usage:

fsttablecompose 
Composition algorithm [between two FSTs of standard type, in tropical
semiring] that is more efficient for certain cases-- in particular,
where one of the FSTs (the left one, if --match-side=left) has large
out-degree

Usage:  fsttablecompose (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)]

Use real columns:

cd ~/kaldi && mkdir HCLG
fsttablecompose ~/kaldi/data/L/lang/L_disambig.fst ~/kaldi/data/G/normal/G.fst | fstdeterminizestar --use-log=true | fstminimizeencoded | fstpushspecial | fstarcsort --sort_type=ilabel > ~/kaldi/data/HCLG/LG.fst

insert image description here

fstisstochastic This is a diagnostic step, he prints out two numbers, minimum weight and maximum weight

fsttablecompose merges two fsts (L.fst, G.fst) into one fst (LG.fst), and the front-end output characters correspond to the upper-back-end input. The merged front-end input is used as the input of the merged fst, and the back-end output as merged output;

fstdeterminizestar does determinization (only jumps to one state after receiving the same input from one state), eliminates empty transfers, and reduces the redundancy of the graph;

fstminimizeencoded minimizes fst, pushes the weight forward as far as possible, and uses the information of the upper language model as much as possible to avoid important paths being pruned;

fstisstochastic is normalized to ensure that the sum of the output probabilities on the state is 1.

Visualize LG.fst

  • fstprint
cd ~/kaldi/data
fstprint --isymbols=./G/normal/phones.txt --osymbols=./G/normal/words.txt ./HCLG/LG.fst > ./HCLG/LG.txt

insert image description here

  • fstdraw
fstdraw --isymbols=./G/normal/phones.txt --osymbols=./G/normal/words.txt ./HCLG/LG.fst > ./HCLG/LG.dot  # 生成dot文件
dot -Tsvg ./HCLG/LG.dot > LG.svg # 转成svg矢量图(放大不会失真)

It takes a long time without trying.

Combined-CLG.fst

fstcomposecontext

usage:

fstcomposecontext 
Composes on the left with a dynamically created context FST

Usage:  fstcomposecontext <ilabels-output-file>  [<in.fst> [<out.fst>] ]
E.g:  fstcomposecontext ilabels.sym < LG.fst > CLG.fst

Use real columns:

cd ~/kaldi/data/HCLG
fstcomposecontext --context-size=1 --central-position=0 --read-disambig-syms=/root/kaldi/data/G/normal/phones/disambig.int --write-disambig-syms=disambig_ilabels.int disambig_ilabels < LG.fst > CLG.fst

Parameter details:
–context-size=1 monophone model
–central-position=0 middle phoneme position is 0
–read-disambig-syms disambig.int comes from the files in the phones folder generated during the generated L or G process, input The file LdG-Ngram.fst comes from the LdG-Ngram.fst model merged in the previous step.

insert image description here

In Kaldi, a separate C.fst is generally not explicitly created and then combined with LG. Instead of fsttablecomposecommands, fstcomposecontext tools are used to dynamically generate CLG.fst based on LG.fst. Of course, you can also create C.fst first, and then use fsttablecomposecommand fusion, but this method is quite time-consuming. Here are two files built disambig_ilabels.intand used to generate Ha.fst.disambig_ilabels

Visualize-CLG.fst

  • fstprint
fstprint fstprint --isymbols=../G/normal/phones.txt --osymbols=../G/normal/words.txt ./CLG.fst > CLG.txt

insert image description here

  • fstdraw
fstdraw --isymbols=./G/normal/phones.txt --osymbols=../G/normal/words.txt ../CLG.fst > CLG.dot # 再使用dot工具转为图片即可

Generate H.fst

make-h-transducer

make-h-transducer is based on the HMM topology to build an acoustic model without self-transfer Ha.fs

usage:

make-h-transducer 
Make H transducer from transition-ids to context-dependent phones, 
 without self-loops [use add-self-loops to add them]
Usage:   make-h-transducer <ilabel-info-file> <tree-file> <transition-gmm/acoustic-model> [<H-fst-out>]
e.g.: 
 make-h-transducer ilabel_info  1.tree 1.mdl > H.fst

Use real columns:

make-h-transducer disambig_ilabels /root/kaldi/data/H/mono/tree /root/kaldi/data/H/mono/final.mdl > Ha.fst

Parameter details:
The first input parameter (disambig_ilabels) is generated when CLG.fst is combined.
The second input parameter is the decision tree (tree) generated by GMM training.
The third input parameter is the final model generated by GMM training. ( a in Ha.fst means no self-loop) .
insert image description here

Combine HCLG.fst

Generate HaCLG.fst

fsttablecompose

fstrmsymbols: Removes disambiguation-related transfers in HaCLG.fst models. disambig_tid.int is generated when CLG.fst is combined.

usage:

fsttablecompose 
Composition algorithm [between two FSTs of standard type, in tropical
semiring] that is more efficient for certain cases-- in particular,
where one of the FSTs (the left one, if --match-side=left) has large
out-degree
Usage:  fsttablecompose (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)]

Use real columns:

fsttablecompose Ha.fst CLG.fst | fstdeterminizestar --use-log=true | fstrmsymbols disambig_tid.int | fstrmepslocal | fstminimizeencoded | fstpushspecial > HaCLG.fst

insert image description here

1. Add a self-loop to the HaCLG.fst model

add-self-loops --self-loop-scale=0.1 --reorder=true /root/kaldi/data/H/mono/final.mdl < HaCLG.fst

Generate HCLG.fst

2. Convert HaCLG to HCLG

fstconvert --fst_type=const HaCLG.fst >HCLG.fst

insert image description here


So far, HCLG.fst has been generated, and the core content of the entire kaldi speech recognition system has been constructed , and it only needs to be applied!

If you have any questions, please feel free to private message or leave a message to discuss. The complete virtual machine clone will be posted in the comment area, thank you for your support!

Recommended article: Kaldi's HCLG composition process visualization

Guess you like

Origin blog.csdn.net/yxn4065/article/details/129151323