Language model and n-gram

Language model (LM) occupies an important position in natural language processing, especially in related researches such as speech recognition, machine translation, automatic Chinese word segmentation and syntax analysis based on statistical models. At present, the n-gram model (n-gram model) is mainly used. This model is simple and straightforward to build, but at the same time, it must adopt a smoothing algorithm because of lack of data.

n-gram

A language model is usually constructed as the probability distribution p (s) of the string s, where p (s) attempts to reflect the frequency of occurrence of the string s as a sentence. For example, in a language model that describes spoken language, if one out of every 100 sentences in a person's utterance is Okay, then p (Okay) ≈ 0.01 can be considered. And for the sentence "An apple ate the chicken" we can think of its probability as 0, because almost no one would say such a sentence.
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

Published 304 original articles · 51 praises · 140,000 views

Guess you like

Origin blog.csdn.net/qq_39905917/article/details/100024038