Natural language processing from entry to application - the basic tasks of natural language processing: POS Tagging and syntactic parsing

Category: General Catalog of "Natural Language Processing from Entry to Application"


part-of-speech tagging

Part of speech is the grammatical role played by words in a sentence, also known as part of speech (Part-Of-Speech, POS). For example, words that represent the names of abstract or concrete things (such as "computer") are classified as nouns, while words that represent actions (such as "hit") and states (such as "existence") are classified as verbs. Part of speech can provide assistance for syntactic analysis, semantic understanding, etc. The part-of-speech tagging (POS Tagging) task is to give a sentence and output the corresponding part of speech of each word in the sentence. For example, when the input sentence is:

He likes to play chess.

Then the output of part-of-speech tagging is:

He/PN likes/VV to play/VV chess/NN. /PU

Among them, PN, VV, NN and PU behind the slash represent pronouns, verbs, nouns and punctuation marks, respectively. The main difficulty of part-of-speech tagging is ambiguity, that is, a word may have different parts of speech in different contexts. For example, "下" in the above example can represent both a verb and a location word. Therefore, it is necessary to determine the specific part of speech of the word in the sentence in combination with the context.

Syntax analysis

The main goal of syntactic parsing is to give a sentence and analyze the syntactic component information of the sentence, such as subject-predicate-object definite complement and other components. The ultimate goal is to convert the sentence represented by the word sequence into a tree structure, which helps to understand the meaning of the sentence more accurately and assists downstream natural language processing tasks. For example, for the following two sentences:

The article you forwarded is very good.
It's good that you forwarded this article.

Although they only differ by one word "的", the semantics expressed are completely different, mainly because the subjects of the two sentences are different. Among them, the subject of the first sentence is "article", while the subject of the second sentence is the action of "turn". By syntactically analyzing the two sentences, the respective subjects can be accurately known, and thus different semantics can be deduced. There are two typical syntactic structure representation methods—phrase structure syntactic representation and dependency structure syntactic representation. The difference between them lies in the grammatical rules they rely on. Among them, the phrase structure syntax representation relies on the context-free grammar, which belongs to a hierarchical representation method. And the dependency structure syntax means relying on the dependency grammar. The figure below compares the two syntactic structure representations. In the phrase structure syntax representation, S stands for start symbol, NP and VP stand for noun phrase and verb phrase respectively. In the syntax representation of the dependency structure, sub and obj represent the subject and object respectively, and root represents the virtual root node, which points to the core predicate of the entire sentence.
Syntactic Structure Representation Method

References:
[1] Che Wanxiang, Cui Yiming, Guo Jiang. Natural language processing: a method based on pre-training model [M]. Electronic Industry Press, 2021. [2] Shao Hao, Liu Yifeng. Pre-training
language model [M] ]. Electronic Industry Press, 2021.
[3] He Han. Introduction to Natural Language Processing [M]. People's Posts and Telecommunications Press, 2019 [
4] Sudharsan Ravichandiran. BERT Basic Tutorial: Transformer Large Model Combat [M]. People's Posts and Telecommunications Publishing Society, 2023
[5] Wu Maogui, Wang Hongxing. Simple Embedding: Principle Analysis and Application Practice [M]. Machinery Industry Press, 2021.

Guess you like

Origin blog.csdn.net/hy592070616/article/details/131024820