Mathematical model, how to calculate probability?

Since it is a mathematical model, how should it be calculated?

The easiest way, of course, is to use statistical methods to calculate. Simply put, it is to rely on the context of the input to calculate the probability of subsequent words, such as "Have you eaten dinner?", "Have you eaten?" In terms of probability, nouns such as "rice" or "dinner" have a higher probability than verbs such as "sleep" and "sleep".

This is the first stage of the language model. The model is also called the Statistical Language Model (SLM). The basic idea is to build a word measurement model based on the Markov hypothesis and predict the next word based on the most recent context.

The development of the subsequent language model has iterated three versions.

The second stage is the neural network language model (Neural Language Model, NLM), which is a training model that uses a neural network to learn the relevance and probability relationship between words. It can use a large amount of data for deep learning to capture more complex relationships between words. The NLM model adopts a hierarchical structure, which projects the input text data space into a high-dimensional semantic space and learns it. By continuously updating the parameters of the neural network model, the neural network of NLM gradually learns the semantics of text data and can generate coherent, natural and semantically accurate text.

Compared with the aforementioned SLM, due to the stronger learning ability of deep neural networks, NLM has better generalization ability and adaptability when learning language models. For example, longer text can be generated, etc. However, NLM is relatively dependent on larger data sets, and requires a lot of manpower on data labeling.

The third stage is the pre-trained language model (Pre-trained Language Model, PLM), which is a natural language processing model that uses a large amount of text data to train. Compared with NLM, PLM uses unsupervised learning methods, so there is no need to label data or indicate information such as text types. The Transformer architecture that you may have heard is a pre-trained language model.

Guess you like

Origin blog.csdn.net/weixin_41937552/article/details/130650658