kenlm studies using

yum install -y cmake
yum install -y boost
yum install -y boost-devel
yum install -y boost-doc
yum install -y zlib
yum install -y zlib-devel
yum install -y gcc gcc-c++ kernel-devel


wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
mkdir kenlm/build
cd kenlm/build
cmake ..
make -j2

Use the following command training:
cd bin /
mkdir the Result

Training
word size:

词粒度:
./lmplz -o 3 --verbose_header --text people2014_words.txt --arpa result/people2014_words.arpa
./lmplz -o 3 --verbose_header --text corpus_seg.txt --arpa result/corpus_seg.arpa
 ./lmplz -o 3 --verbose_header --text test.txt --arpa result/test.arpa

The above parameters need to be adjusted according to their file location. Meaning of the various parameters:
  -ON: using the highest n-gram grammar
  -verbose_header: adding the position in the file header statistics generated
  --text text_file: specify where expected txt file
  --arpa: Specifies the output file arpa

Compression:
model created above to binary, fast loading convenience model, in fact, is the * .bin file * .klm are binary files:
./build_binary ./result/people2014_words.arpa ./result/people2014_words.klm
 ./build_binary ./ result / corpus_seg.arpa ./result/corpus_seg.klm 

Reference:
https://blog.csdn.net/mingzai624/article/details/79560063
http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/
https://blog.csdn.net/libaineu2004 / article / details / 84823978

Centlm

Guess you like

Origin blog.csdn.net/a857553315/article/details/90610167