yum install -y cmake
yum install -y boost
yum install -y boost-devel
yum install -y boost-doc
yum install -y zlib
yum install -y zlib-devel
yum install -y gcc gcc-c++ kernel-devel
wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
mkdir kenlm/build
cd kenlm/build
cmake ..
make -j2
Use the following command training:
cd bin /
mkdir the Result
Training
word size:
词粒度:
./lmplz -o 3 --verbose_header --text people2014_words.txt --arpa result/people2014_words.arpa
./lmplz -o 3 --verbose_header --text corpus_seg.txt --arpa result/corpus_seg.arpa
./lmplz -o 3 --verbose_header --text test.txt --arpa result/test.arpa
The above parameters need to be adjusted according to their file location. Meaning of the various parameters:
-ON: using the highest n-gram grammar
-verbose_header: adding the position in the file header statistics generated
--text text_file: specify where expected txt file
--arpa: Specifies the output file arpa
Compression:
model created above to binary, fast loading convenience model, in fact, is the * .bin file * .klm are binary files:
./build_binary ./result/people2014_words.arpa ./result/people2014_words.klm
./build_binary ./ result / corpus_seg.arpa ./result/corpus_seg.klm
Reference:
https://blog.csdn.net/mingzai624/article/details/79560063
http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/
https://blog.csdn.net/libaineu2004 / article / details / 84823978
Centlm