Use kenlm tools common errors and solutions

Among NLP learning process, language model makes part of the knowledge we have to learn. Commonly used N-gram training tools SRILM, IRSTLM, BerkeleyLM KenLM and the like.

Kenlm use in the process also encountered many problems.

Environment set up:

Referring to blog: https: //www.cnblogs.com/jasmine-Jobs/p/7214758.html

More effective pro-test, but various installation command to be familiar with the linux environment to know, because some of the time given in the download location does not exist

Detailed example of the principle kenlm:

Referring to blog: https: //blog.csdn.net/asrgreek/article/details/81979194

He said very detailed, recommended own handwriting the entire operation process.

Important to note that, according to the author to run the command (lmplz -o 2 --text [inputfile] --arpa [outputfile]), the following error will occur:

To solve this problem is very simple (in fact kenlm already illustrates the solution),

rerun with --discount_fallback

Therefore, rewrite operation command: -o lmplz 2 --discount_fallback --text [inputfile] --arpa [outputfile]

In operation, you will succeed (bin / lmplz -o 2 --discount_fallback --text test.txt --arpa test.apra)

FIG results are as follows:

Open the resulting file apra:

Correct these authors blog generates corresponding effect (we may notice that the author of the first two lines are: ngram 1 = 6 ngram 2 = 7 and my results are not the same, it is because I am in the text in test.txt the second line is no space between you and me, if you add a space, and the author of the blog will be the effect of the same, you can own a try)

Guess you like

Origin www.cnblogs.com/deeplearning1/p/11412031.html