Kaldi Speech Recognition Technology (8) ----- Integrating HCLG
Article directory
HCLG overview
HCLG= min(it(H o min(it(C o min(it(L o G)))))
Combine the four layers layer by layer to get the final graph. Among them, o represents combination, det represents determinization, and min represents minimization.
The fusion of WFST is generally from large to small , that is, first fuse G and L, and then fuse C and H again. Each fusion must be determinized and minimized. Minimization refers to WFST It is converted to an equivalent WFST with fewer state nodes and edges to improve search efficiency. The combination of HCLG can refer tokaldi/wsj/s5/utils/mkgraph.sh
Combine LG.fst
- fsttablecompose
usage:
fsttablecompose
Composition algorithm [between two FSTs of standard type, in tropical
semiring] that is more efficient for certain cases-- in particular,
where one of the FSTs (the left one, if --match-side=left) has large
out-degree
Usage: fsttablecompose (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)]
Use real columns:
cd ~/kaldi && mkdir HCLG
fsttablecompose ~/kaldi/data/L/lang/L_disambig.fst ~/kaldi/data/G/normal/G.fst | fstdeterminizestar --use-log=true | fstminimizeencoded | fstpushspecial | fstarcsort --sort_type=ilabel > ~/kaldi/data/HCLG/LG.fst
fstisstochastic This is a diagnostic step, he prints out two numbers, minimum weight and maximum weight
fsttablecompose merges two fsts (L.fst, G.fst) into one fst (LG.fst), and the front-end output characters correspond to the upper-back-end input. The merged front-end input is used as the input of the merged fst, and the back-end output as merged output;
fstdeterminizestar does determinization (only jumps to one state after receiving the same input from one state), eliminates empty transfers, and reduces the redundancy of the graph;
fstminimizeencoded minimizes fst, pushes the weight forward as far as possible, and uses the information of the upper language model as much as possible to avoid important paths being pruned;
fstisstochastic is normalized to ensure that the sum of the output probabilities on the state is 1.
Visualize LG.fst
- fstprint
cd ~/kaldi/data
fstprint --isymbols=./G/normal/phones.txt --osymbols=./G/normal/words.txt ./HCLG/LG.fst > ./HCLG/LG.txt
- fstdraw
fstdraw --isymbols=./G/normal/phones.txt --osymbols=./G/normal/words.txt ./HCLG/LG.fst > ./HCLG/LG.dot # 生成dot文件
dot -Tsvg ./HCLG/LG.dot > LG.svg # 转成svg矢量图(放大不会失真)
It takes a long time without trying.
Combined-CLG.fst
fstcomposecontext
usage:
fstcomposecontext
Composes on the left with a dynamically created context FST
Usage: fstcomposecontext <ilabels-output-file> [<in.fst> [<out.fst>] ]
E.g: fstcomposecontext ilabels.sym < LG.fst > CLG.fst
Use real columns:
cd ~/kaldi/data/HCLG
fstcomposecontext --context-size=1 --central-position=0 --read-disambig-syms=/root/kaldi/data/G/normal/phones/disambig.int --write-disambig-syms=disambig_ilabels.int disambig_ilabels < LG.fst > CLG.fst
Parameter details:
–context-size=1 monophone model
–central-position=0 middle phoneme position is 0
–read-disambig-syms disambig.int comes from the files in the phones folder generated during the generated L or G process, input The file LdG-Ngram.fst comes from the LdG-Ngram.fst model merged in the previous step.
In Kaldi, a separate C.fst is generally not explicitly created and then combined with LG. Instead of fsttablecompose
commands, fstcomposecontext
tools are used to dynamically generate CLG.fst based on LG.fst. Of course, you can also create C.fst first, and then use fsttablecompose
command fusion, but this method is quite time-consuming. Here are two files built disambig_ilabels.int
and used to generate Ha.fst.disambig_ilabels
Visualize-CLG.fst
- fstprint
fstprint fstprint --isymbols=../G/normal/phones.txt --osymbols=../G/normal/words.txt ./CLG.fst > CLG.txt
- fstdraw
fstdraw --isymbols=./G/normal/phones.txt --osymbols=../G/normal/words.txt ../CLG.fst > CLG.dot # 再使用dot工具转为图片即可
Generate H.fst
make-h-transducer
make-h-transducer is based on the HMM topology to build an acoustic model without self-transfer Ha.fs
usage:
make-h-transducer
Make H transducer from transition-ids to context-dependent phones,
without self-loops [use add-self-loops to add them]
Usage: make-h-transducer <ilabel-info-file> <tree-file> <transition-gmm/acoustic-model> [<H-fst-out>]
e.g.:
make-h-transducer ilabel_info 1.tree 1.mdl > H.fst
Use real columns:
make-h-transducer disambig_ilabels /root/kaldi/data/H/mono/tree /root/kaldi/data/H/mono/final.mdl > Ha.fst
Parameter details:
The first input parameter (disambig_ilabels) is generated when CLG.fst is combined.
The second input parameter is the decision tree (tree) generated by GMM training.
The third input parameter is the final model generated by GMM training. ( a in Ha.fst means no self-loop) .
Combine HCLG.fst
Generate HaCLG.fst
fsttablecompose
fstrmsymbols: Removes disambiguation-related transfers in HaCLG.fst models. disambig_tid.int is generated when CLG.fst is combined.
usage:
fsttablecompose
Composition algorithm [between two FSTs of standard type, in tropical
semiring] that is more efficient for certain cases-- in particular,
where one of the FSTs (the left one, if --match-side=left) has large
out-degree
Usage: fsttablecompose (fst1-rxfilename|fst1-rspecifier) (fst2-rxfilename|fst2-rspecifier) [(out-rxfilename|out-rspecifier)]
Use real columns:
fsttablecompose Ha.fst CLG.fst | fstdeterminizestar --use-log=true | fstrmsymbols disambig_tid.int | fstrmepslocal | fstminimizeencoded | fstpushspecial > HaCLG.fst
1. Add a self-loop to the HaCLG.fst model
add-self-loops --self-loop-scale=0.1 --reorder=true /root/kaldi/data/H/mono/final.mdl < HaCLG.fst
Generate HCLG.fst
2. Convert HaCLG to HCLG
fstconvert --fst_type=const HaCLG.fst >HCLG.fst
So far, HCLG.fst has been generated, and the core content of the entire kaldi speech recognition system has been constructed , and it only needs to be applied!
If you have any questions, please feel free to private message or leave a message to discuss. The complete virtual machine clone will be posted in the comment area, thank you for your support!
Recommended article: Kaldi's HCLG composition process visualization