Natural language processing from entry to application - the basic task of natural language processing: semantic analysis (Semantic Analysis)

Category: General Catalog of "Natural Language Processing from Entry to Application"


The core task of natural language processing is to allow computers to "understand" the meaning contained in natural language, that is, semantics (Semantic). The text vector representation introduced above can be considered to implicitly contain a lot of semantic information. In general, semantic analysis refers to expressing semantics explicitly through discrete symbols and structures. According to the granularity of the language unit to be represented and the semantic representation method, semantic analysis can be divided into various forms.

Considering the granularity of words, a word may have multiple semantics (meanings), for example, "hit", the meaning may be "attack" (such as "beating someone"), or "play" (such as "playing basketball") , or even "weaving" (such as "knitting a sweater"), etc. The natural language processing task of determining the specific meaning of words according to the different contexts in which they appear is called word sense disambiguation (Word Sense Disambiguation, WSD). The possible meaning of each word is often determined through a semantic dictionary, such as WordNet. In addition to the above polysemy situation, there are also polysemy situations, such as "potato" and "potato" have the same meaning. Due to the semantic combination and evolution of language, it is impossible to use dictionaries to define the semantics of sentences, paragraphs, or chapters like words, so it is difficult to express the semantics of language units such as sentences in a unified form. Numerous schools of linguistics have proposed different semantic representations, such as Semantic Role Labeling (SRL), Semantic Dependency Parsing (SDP), etc.

Among them, semantic role labeling is also called the Predicate-Argument Structure (Predicate-Argument Structure), which first identifies the possible predicates (generally verbs) in the sentence, and then determines the semantic role (also called the argument) carried by each predicate. , such as the Agent representing the sender of the action, the Patient representing the recipient of the action, etc. In addition to the core semantic roles, there is also a class of language components that assist in describing actions, which are called additional semantic roles, such as when, where, and how the action occurs. The figure below shows an example of semantic role labeling, where there are two predicates - "like" and "down", and corresponding argument outputs are produced for each predicate. Semantic dependency analysis utilizes general graphs to represent richer semantic information. According to the different types of nodes in the graph, it can be divided into two representations - Semantic Dependency Graph representation and Conceptual Graph representation. Among them, the nodes in the semantic dependency graph are the actual words in the sentence, and semantic relationship edges are created between words. The concept semantic graph first converts sentences into virtual concept nodes, and then creates semantic relationship edges between concept nodes. The figure below shows an example of a semantic dependency graph analysis result.
Semantic Analysis
The above semantic representation methods belong to general semantic representation methods, that is, to design a unified semantic representation for various language phenomena. In addition to this, there is another class of semantic analysis that specializes in specific tasks, such as converting database queries expressed in natural language into Structured Query Language (SQL). For example, for the student information table shown in the figure below, the system needs to convert the user's natural language query: names of students older than 18 into SQL statements: select name where age >18.
Student Information Form

References:
[1] Che Wanxiang, Cui Yiming, Guo Jiang. Natural language processing: a method based on pre-training model [M]. Electronic Industry Press, 2021. [2] Shao Hao, Liu Yifeng. Pre-training
language model [M] ]. Electronic Industry Press, 2021.
[3] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019 [
4] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Combat [M]. People's Posts and Telecommunications Publishing Society, 2023
[5] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131024859